您當前位置：首頁 > php開源 > 綜合技術 > 詳解如何使用代碼進行音頻合成

詳解如何使用代碼進行音頻合成

來源：程序員人生發布時間：2016-07-26 13:18:42 閱讀次數：2999次

作者：鄭童宇
GitHub：https://github.com/CrazyZty

1.前言

　　音頻合成在現實生活中利用廣泛，在網上可以搜索到很多相干的講授和代碼實現，但個人感覺在網上搜索到的音頻合成相干文章的講授都并不是10分透徹，故而寫下本篇博文，計劃通過講授如何使用代碼實現音頻合成功能從而將本人對音頻合成的理解論述給各位，力圖讀完的各位可以對音頻合成整體進程有1個清晰的了解。
　　本篇博文以Java為示例語言，以Android為示例平臺。
　　本篇博文著力于講授音頻合成實現原理與進程中的細節和潛伏問題，目的是讓各位不被編碼語言所限制，在本質上理解如何實現音頻合成的功能。

2.音頻合成

2.1.功能簡介

　　本次實現的音頻合成功能參考"唱吧"的音頻合成，功能流程是：錄音生成PCM文件，接著根據錄音時長對背景音樂文件進行解碼加裁剪，同時將解碼后的音頻調制到與錄音文件相同的采樣率，采樣點字節數，聲道數，接著根據指定系數對兩個音頻文件進行音量調理并合成為PCM文件，最落后行緊縮編碼生成MP3文件。

2.2.功能實現

2.2.1.錄音

　　錄音功能生成的目標音頻格式是PCM格式，對PCM的定義，維基百科上是這么寫到的："Pulse-code modulation (PCM) is a method used to digitally represent sampled analog signals. It is the standard form of digital audio in computers, Compact Discs, digital telephony and other digital audio applications. In a PCM stream, the amplitude of the analog signal is sampled regularly at uniform intervals, and each sample is quantized to the nearest value within a range of digital steps."，大致意思是PCM是用來采樣摹擬信號的1種方法，是現在數字音頻利用中數字音頻的標準格式，而PCM采樣的原理，是均勻間隔的將摹擬信號的振幅量化成指定數據范圍內最貼近的數值。
　　PCM文件存儲的數據是不經緊縮的純音頻數據，固然只是這么說可能有些抽象，我們拉上大家熟知的MP3文件進行對照，MP3文件存儲的是緊縮后的音頻，PCM與MP3二者之間的關系簡單說就是：PCM文件經過MP3緊縮算法處理后生成的文件就是MP3文件。我們簡單比較1下雙方存儲所消耗的空間，1分鐘的每采樣點16位的雙聲道的44.1kHz采樣率PCM文件大小為：1*60*16/8*2*44.1*1000/1024=10335.9375KB，約為10MB，而對應的128kps的MP3文件大小僅為1MB左右，既然PCM文件占用存儲空間這么大，我們是否是應當放棄使用PCM格式存儲錄音，恰恰相反，注意第1句話:"PCM文件存儲的數據是不經緊縮的純音頻數據"，這意味只有PCM格式的音頻數據是可以用來直接進行聲音處理，例如進行音量調理，聲音濾鏡等操作，相對的其他的音頻編碼格式都是必須解碼后才能進行處理（PCM編碼的WAV文件也得先讀取文件頭），固然這不代表PCM文件就好用，由于沒有文件頭，所以進行處理或播放之前我們必須事前知道PCM文件的聲道數，采樣點字節數，采樣率，編碼大小端，這在大多數情況下都是不可能的，事實上就我所知沒有播放器是直接支持PCM文件的播放。不過現在錄音的各項系數都是我們定義的，所以我們就不用擔心這個問題。
　　背景知識了解這些就足夠了，下面我給出實現代碼，綜合代碼講授實現進程。

if (recordVoice) { audioRecord = new AudioRecord(MediaRecorder.AudioSource.MIC, Constant.RecordSampleRate, AudioFormat.CHANNEL_IN_MONO, pcmFormat.getAudioFormat(), audioRecordBufferSize); try { audioRecord.startRecording(); } catch (Exception e) { NoRecordPermission(); continue; } BufferedOutputStream bufferedOutputStream = FileFunction .GetBufferedOutputStreamFromFile(recordFileUrl); while (recordVoice) { int audioRecordReadDataSize = audioRecord.read(audioRecordBuffer, 0, audioRecordBufferSize); if (audioRecordReadDataSize > 0) { calculateRealVolume(audioRecordBuffer, audioRecordReadDataSize); if (bufferedOutputStream != null) { try { byte[] outputByteArray = CommonFunction .GetByteBuffer(audioRecordBuffer, audioRecordReadDataSize, Variable.isBigEnding); bufferedOutputStream.write(outputByteArray); } catch (IOException e) { e.printStackTrace(); } } } else { NoRecordPermission(); continue; } } if (bufferedOutputStream != null) { try { bufferedOutputStream.close(); } catch (Exception e) { LogFunction.error("關閉錄音輸出數據流異常", e); } } audioRecord.stop(); audioRecord.release(); audioRecord = null; }

　　錄音的實際實現和控制代碼較多，在此僅抽出核心的錄音代碼進行講授。在此為獲得錄音的原始數據，我使用了Android原生的AudioRecord，其他的平臺基本也會提供類似的工具類。這段代碼實現的功能是當錄音開始后，利用會根據設定的采樣率和聲道數和采樣字節數來不斷從MIC中獲得原始的音頻數據，然后將獲得的音頻數據寫入到指定文件中，直至錄音結束。這段代碼邏輯比較清晰的，我就不過量講授了。
　　潛伏問題的話，手機平臺上是需要申請錄音權限的，如果沒有錄音權限就沒法生成正確的錄音文件。

2.2.2.解碼與裁剪背景音樂

　　如前文所說，除PCM格式之外的所有音頻編碼格式的音頻都必須解碼后才可以處理，因此要讓背景音樂參與合成必須事前對背景音樂進行解碼，同時為減少合成的MP3文件的大小，需要根據錄音時長對解碼的音頻文件進行裁剪。本節不會詳細解釋解碼算法，由于每一個平臺都會有對應封裝的工具類，直接使用便可。
　　背景知識先講這些，本次功能實現進程中的潛伏問題較多，下面我給出實現代碼，綜合代碼講授實現進程。

private boolean decodeMusicFile(String musicFileUrl, String decodeFileUrl, int startSecond, int endSecond, Handler handler, DecodeOperateInterface decodeOperateInterface) { int sampleRate = 0; int channelCount = 0; long duration = 0; String mime = null; MediaExtractor mediaExtractor = new MediaExtractor(); MediaFormat mediaFormat = null; MediaCodec mediaCodec = null; try { mediaExtractor.setDataSource(musicFileUrl); } catch (Exception e) { LogFunction.error("設置解碼音頻文件路徑毛病", e); return false; } mediaFormat = mediaExtractor.getTrackFormat(0); sampleRate = mediaFormat.containsKey(MediaFormat.KEY_SAMPLE_RATE) ? mediaFormat.getInteger(MediaFormat.KEY_SAMPLE_RATE) : 44100; channelCount = mediaFormat.containsKey(MediaFormat.KEY_CHANNEL_COUNT) ? mediaFormat.getInteger(MediaFormat.KEY_CHANNEL_COUNT) : 1; duration = mediaFormat.containsKey(MediaFormat.KEY_DURATION) ? mediaFormat.getLong(MediaFormat.KEY_DURATION) : 0; mime = mediaFormat.containsKey(MediaFormat.KEY_MIME) ? mediaFormat.getString(MediaFormat.KEY_MIME) : ""; LogFunction.log("歌曲信息", "Track info: mime:" + mime + " 采樣率sampleRate:" + sampleRate + " channels:" + channelCount + " duration:" + duration); if (CommonFunction.isEmpty(mime) || !mime.startsWith("audio/")) { LogFunction.error("解碼文件不是音頻文件", "mime:" + mime); return false; } if (mime.equals("audio/ffmpeg")) { mime = "audio/mpeg"; mediaFormat.setString(MediaFormat.KEY_MIME, mime); } try { mediaCodec = MediaCodec.createDecoderByType(mime); mediaCodec.configure(mediaFormat, null, null, 0); } catch (Exception e) { LogFunction.error("解碼器configure出錯", e); return false; } getDecodeData(mediaExtractor, mediaCodec, decodeFileUrl, sampleRate, channelCount, startSecond, endSecond, handler, decodeOperateInterface); return true; }

　　decodeMusicFile方法的代碼主要功能是獲得背景音樂信息，初始化解碼器，最后調用getDecodeData方法正式開始對背景音樂進行處理。
　　代碼中使用了Android原生工具類作為解碼器，事實上作為原生的解碼器，我也遇到過兼容性問題不能不做了1些相應的處理，不能不抱怨1句不同的Android定制系統實在是致使了太多的兼容性問題。

private void getDecodeData(MediaExtractor mediaExtractor, MediaCodec mediaCodec, String decodeFileUrl, int sampleRate, int channelCount, int startSecond, int endSecond, Handler handler, final DecodeOperateInterface decodeOperateInterface) { boolean decodeInputEnd = false; boolean decodeOutputEnd = false; int sampleDataSize; int inputBufferIndex; int outputBufferIndex; int byteNumber; long decodeNoticeTime = System.currentTimeMillis(); long decodeTime; long presentationTimeUs = 0; final long timeOutUs = 100; final long startMicroseconds = startSecond * 1000 * 1000; final long endMicroseconds = endSecond * 1000 * 1000; ByteBuffer[] inputBuffers; ByteBuffer[] outputBuffers; ByteBuffer sourceBuffer; ByteBuffer targetBuffer; MediaFormat outputFormat = mediaCodec.getOutputFormat(); MediaCodec.BufferInfo bufferInfo; byteNumber = (outputFormat.containsKey("bit-width") ? outputFormat.getInteger("bit-width") : 0) / 8; mediaCodec.start(); inputBuffers = mediaCodec.getInputBuffers(); outputBuffers = mediaCodec.getOutputBuffers(); mediaExtractor.selectTrack(0); bufferInfo = new MediaCodec.BufferInfo(); BufferedOutputStream bufferedOutputStream = FileFunction .GetBufferedOutputStreamFromFile(decodeFileUrl); while (!decodeOutputEnd) { if (decodeInputEnd) { return; } decodeTime = System.currentTimeMillis(); if (decodeTime - decodeNoticeTime > Constant.OneSecond) { final int decodeProgress = (int) ((presentationTimeUs - startMicroseconds) * Constant.NormalMaxProgress / endMicroseconds); if (decodeProgress > 0) { handler.post(new Runnable() { @Override public void run() { decodeOperateInterface.updateDecodeProgress(decodeProgress); } }); } decodeNoticeTime = decodeTime; } try { inputBufferIndex = mediaCodec.dequeueInputBuffer(timeOutUs); if (inputBufferIndex >= 0) { sourceBuffer = inputBuffers[inputBufferIndex]; sampleDataSize = mediaExtractor.readSampleData(sourceBuffer, 0); if (sampleDataSize < 0) { decodeInputEnd = true; sampleDataSize = 0; } else { presentationTimeUs = mediaExtractor.getSampleTime(); } mediaCodec.queueInputBuffer(inputBufferIndex, 0, sampleDataSize, presentationTimeUs, decodeInputEnd ? MediaCodec.BUFFER_FLAG_END_OF_STREAM : 0); if (!decodeInputEnd) { mediaExtractor.advance(); } } else { LogFunction.error("inputBufferIndex", "" + inputBufferIndex); } // decode to PCM and push it to the AudioTrack player outputBufferIndex = mediaCodec.dequeueOutputBuffer(bufferInfo, timeOutUs); if (outputBufferIndex < 0) { switch (outputBufferIndex) { case MediaCodec.INFO_OUTPUT_BUFFERS_CHANGED: outputBuffers = mediaCodec.getOutputBuffers(); LogFunction.error("MediaCodec.INFO_OUTPUT_BUFFERS_CHANGED", "[AudioDecoder]output buffers have changed."); break; case MediaCodec.INFO_OUTPUT_FORMAT_CHANGED: outputFormat = mediaCodec.getOutputFormat(); sampleRate = outputFormat.containsKey(MediaFormat.KEY_SAMPLE_RATE) ? outputFormat.getInteger(MediaFormat.KEY_SAMPLE_RATE) : sampleRate; channelCount = outputFormat.containsKey(MediaFormat.KEY_CHANNEL_COUNT) ? outputFormat.getInteger(MediaFormat.KEY_CHANNEL_COUNT) : channelCount; byteNumber = (outputFormat.containsKey("bit-width") ? outputFormat.getInteger("bit-width") : 0) / 8; LogFunction.error("MediaCodec.INFO_OUTPUT_FORMAT_CHANGED", "[AudioDecoder]output format has changed to " + mediaCodec.getOutputFormat()); break; default: LogFunction.error("error", "[AudioDecoder] dequeueOutputBuffer returned " + outputBufferIndex); break; } continue; } targetBuffer = outputBuffers[outputBufferIndex]; byte[] sourceByteArray = new byte[bufferInfo.size]; targetBuffer.get(sourceByteArray); targetBuffer.clear(); mediaCodec.releaseOutputBuffer(outputBufferIndex, false); if ((bufferInfo.flags & MediaCodec.BUFFER_FLAG_END_OF_STREAM) != 0) { decodeOutputEnd = true; } if (sourceByteArray.length > 0 && bufferedOutputStream != null) { if (presentationTimeUs < startMicroseconds) { continue; } byte[] convertByteNumberByteArray = ConvertByteNumber(byteNumber, Constant.RecordByteNumber, sourceByteArray); byte[] resultByteArray = ConvertChannelNumber(channelCount, Constant.RecordChannelNumber, Constant.RecordByteNumber, convertByteNumberByteArray); try { bufferedOutputStream.write(resultByteArray); } catch (Exception e) { LogFunction.error("輸出解壓音頻數據異常", e); } } if (presentationTimeUs > endMicroseconds) { break; } } catch (Exception e) { LogFunction.error("getDecodeData異常", e); } } if (bufferedOutputStream != null) { try { bufferedOutputStream.close(); } catch (IOException e) { LogFunction.error("關閉bufferedOutputStream異常", e); } } if (sampleRate != Constant.RecordSampleRate) { Resample(sampleRate, decodeFileUrl); } if (mediaCodec != null) { mediaCodec.stop(); mediaCodec.release(); } if (mediaExtractor != null) { mediaExtractor.release(); } }

　　getDecodeData方法是此次的進行解碼和裁剪的核心，方法的傳入參數中mediaExtractor，mediaCodec用以實際控制處理背景音樂的音頻數據，decodeFileUrl用以指明解碼和裁剪后的PCM文件的存儲地址，sampleRate，channelCount分別用以指明背景音樂的采樣率，聲道數，startSecond用以指明裁剪背景音樂的開始時間，目前功能中默許為0，endSecond用以指明裁剪背景音樂的結束時間，數值大小由錄音時長直接決定。
　　getDecodeData方法中通過不斷通過mediaCodec讀入背景音樂原始數據進行處理，然后解碼輸出到buffer從而獲得解碼后的數據，由于mediaCodec的讀取解碼方法和平臺相干就不過量描寫，在解碼進程中通過startSecond與endSecond來控制解碼后音頻數據輸出的開始與結束。
　　解碼和裁剪根據上文的描寫是比較簡單的，通過平臺提供的工具類解碼背景音樂數據，然后通過變量裁剪出指定長度的解碼后音頻數據輸出到外文件，這1個流程結束功能就實現了，但在進程中存在幾個潛伏問題點。
　　首先，要進行合成處理的話，我們必須要保證錄音文件和解碼后文件的采樣率，采樣點字節數，和聲道數相同，由于錄音文件的這3項系數已固定，所以我們必須對解碼的音頻數據進行處理以保證終究生成的解碼文件3項系數和錄音文件1致。在http://blog.csdn.net/ownwell/article/details/8114121/，我們可以了解PCM文件常見的4種存儲格式。
　　格式字節1 字節2 字節3 字節4
　　8位單聲道 0聲道 0聲道 0聲道 0聲道
　　8位雙聲道 0聲道(左) 1聲道(右) 0聲道(左) 1聲道(右)
　　16位單聲道 0聲道(低) 0聲道(高) 0聲道(低) 0聲道(高)
　　16位雙聲道 0聲道(左，低字節) 0聲道(左，高字節) 1聲道(右，低字節) 1聲道(右，高字節)
　　了解這些知識后，我們就能夠知道如何編碼以將已知格式的音頻數據轉化到另外一采樣點字節數和聲道數。
　　getDecodeData方法中146行調用的ConvertByteNumber方法是通過處理音頻數據以保證解碼后音頻文件和錄音文件采樣點字節數相同。

private static byte[] ConvertByteNumber(int sourceByteNumber, int outputByteNumber, byte[] sourceByteArray) { if (sourceByteNumber == outputByteNumber) { return sourceByteArray; } int sourceByteArrayLength = sourceByteArray.length; byte[] byteArray; switch (sourceByteNumber) { case 1: switch (outputByteNumber) { case 2: byteArray = new byte[sourceByteArrayLength * 2]; byte resultByte[]; for (int index = 0; index < sourceByteArrayLength; index += 1) { resultByte = CommonFunction.GetBytes((short) (sourceByteArray[index] * 256), Variable.isBigEnding); byteArray[2 * index] = resultByte[0]; byteArray[2 * index + 1] = resultByte[1]; } return byteArray; } break; case 2: switch (outputByteNumber) { case 1: int outputByteArrayLength = sourceByteArrayLength / 2; byteArray = new byte[outputByteArrayLength]; for (int index = 0; index < outputByteArrayLength; index += 1) { byteArray[index] = (byte) (CommonFunction.GetShort(sourceByteArray[2 * index], sourceByteArray[2 * index + 1], Variable.isBigEnding) / 256); } return byteArray; } break; } return sourceByteArray; }

　　ConvertByteNumber方法的參數中sourceByteNumber代表背景音樂文件采樣點字節數，outputByteNumber代表錄音文件采樣點字節數，二者如果相同就不處理，不相同則根據背景音樂文件采樣點字節數進行不同的處理，本方法只對單字節存儲和雙字節存儲進行了處理，歡迎在各位Github上填充其他采樣點字節數的處理方法，
　　getDecodeData方法中149行調用的ConvertChannelNumber方法是通過處理音頻數據以保證解碼后音頻文件和錄音文件聲道數相同。

private static byte[] ConvertChannelNumber(int sourceChannelCount, int outputChannelCount, int byteNumber, byte[] sourceByteArray) { if (sourceChannelCount == outputChannelCount) { return sourceByteArray; } switch (byteNumber) { case 1: case 2: break; default: return sourceByteArray; } int sourceByteArrayLength = sourceByteArray.length; byte[] byteArray; switch (sourceChannelCount) { case 1: switch (outputChannelCount) { case 2: byteArray = new byte[sourceByteArrayLength * 2]; byte firstByte; byte secondByte; switch (byteNumber) { case 1: for (int index = 0; index < sourceByteArrayLength; index += 1) { firstByte = sourceByteArray[index]; byteArray[2 * index] = firstByte; byteArray[2 * index + 1] = firstByte; } break; case 2: for (int index = 0; index < sourceByteArrayLength; index += 2) { firstByte = sourceByteArray[index]; secondByte = sourceByteArray[index + 1]; byteArray[2 * index] = firstByte; byteArray[2 * index + 1] = secondByte; byteArray[2 * index + 2] = firstByte; byteArray[2 * index + 3] = secondByte; } break; } return byteArray; } break; case 2: switch (outputChannelCount) { case 1: int outputByteArrayLength = sourceByteArrayLength / 2; byteArray = new byte[outputByteArrayLength]; switch (byteNumber) { case 1: for (int index = 0; index < outputByteArrayLength; index += 2) { short averageNumber = (short) ((short) sourceByteArray[2 * index] + (short) sourceByteArray[2 * index + 1]); byteArray[index] = (byte) (averageNumber >> 1); } break; case 2: for (int index = 0; index < outputByteArrayLength; index += 2) { byte resultByte[] = CommonFunction.AverageShortByteArray(sourceByteArray[2 * index], sourceByteArray[2 * index + 1], sourceByteArray[2 * index + 2], sourceByteArray[2 * index + 3], Variable.isBigEnding); byteArray[index] = resultByte[0]; byteArray[index + 1] = resultByte[1]; } break; } return byteArray; } break; } return sourceByteArray; }

　　ConvertChannelNumber方法的參數中sourceChannelNumber代表背景音樂文件聲道數，outputChannelNumber代表錄音文件聲道數，二者如果相同就不處理，不相同則根據聲道數和采樣點字節數進行不同的處理，本方法只對單雙通道進行了處理，歡迎在Github上填充立體聲等聲道的處理方法。
　　getDecodeData方法中176行調用的Resample方法是用以處理音頻數據以保證解碼后音頻文件和錄音文件采樣率相同。

private static void Resample(int sampleRate, String decodeFileUrl) { String newDecodeFileUrl = decodeFileUrl + "new"; try { FileInputStream fileInputStream = new FileInputStream(new File(decodeFileUrl)); FileOutputStream fileOutputStream = new FileOutputStream(new File(newDecodeFileUrl)); new SSRC(fileInputStream, fileOutputStream, sampleRate, Constant.RecordSampleRate, Constant.RecordByteNumber, Constant.RecordByteNumber, 1, Integer.MAX_VALUE, 0, 0, true); fileInputStream.close(); fileOutputStream.close(); FileFunction.RenameFile(newDecodeFileUrl, decodeFileUrl); } catch (IOException e) { LogFunction.error("關閉bufferedOutputStream異常", e); } }

　　為了修改采樣率，在此使用了SSRC在Java真個實現，在網上可以搜到1份關于SSRC的介紹："SSRC = Synchronous Sample Rate Converter，同步采樣率轉換，直白地說就是只能做整數倍頻，不支持任意頻率之間的轉換，比如44.1KHz<->48KHz。"，但不同的SSRC實現原理有所不同，我是用的是來自https://github.com/shibatch/SSRC在Java真個實現，簡單讀了此SSRC在Java端實現的源碼，其代碼實現中通過辨別重采樣前后采樣率的最大公約數是不是滿足設定條件作為是不是可重采樣的根據，可以支持常見的非整數倍頻率的采樣率轉化，如44.1khz<->48khz，但如果目標采樣率是比較特殊的采樣率如某1較大的質數，那就沒法支穩重采樣。
　　至此，Resample，ConvertByteNumber，ConvertChannelNumber3個方法的處理保證了解碼后文件和錄音文件的采樣率，采樣點字節數，和聲道數相同。
　　接著，此處潛伏的第2個問題就是大小端存儲。對計算機體系結構有所了解的同學肯定了解"大小端"這個概念，大小端分別代表了多字節數據在內存中組織的兩種不同順序，如果對"大小端"不是太了解，可以閱讀http://blog.jobbole.com/102432/的論述，在處理音頻數據的方法中，我們可以看到"Variable.isBigEnding"這個參數，這個參數的含義就是當前平臺是不是使用大端編碼，這里大家肯定會有疑問，內存中多字節數據的組織順序為何會影響我們對音頻數據的處理，舉個例子，如果我們在將采樣點8位的音頻數據轉化為采樣點16位，目前的做法是將原始數據乘以256，相當于每個byte轉化為short，同時short的高字節為原byte的內容，低字節為0，那現在問題來了，那就是高字節放到高地址還是低地址，這就和平臺采取的大小端存儲格式息息相干了，固然如果我們輸出的數據類型是short那就不用關心，Java會幫我們處理掉，但我們輸出的是byte數組，這就需要我們自己對數據進行處理了。
　　這是1個很容易忽視的問題，由于正常情況下的軟件開發進程中我們基本是不用關心大小真個問題的，但在這里必須對大小真個情況進行處理，不然會出現在某些平臺合成的音頻沒法播放的情況。

2.2.3.合成與輸出

　　錄音和對背景音樂的處理結束了，接下來就是最后的合成了，對合成我們腦海中顯現最多的會是甚么？相加，對沒錯，音頻合成其實不神秘，音頻合成的本質就是相同系數的音頻文件之間數據的加和，固然現實中的合成常常并不是如此簡單，在網上搜索"混音算法"，我們可以看到大量精深的音頻合成算法，但就目前而言，我們沒必要實現復雜的混音算法，只要讓兩個音頻文件的原始音頻數據相加便可，不過為了讓我們的合成看上去略微有1些技術含量，此次提供的音頻合成方法中允許任意音頻文件相對另外一音頻文件進行時間上的偏移，并可以通過兩個權重數據進行音量調理。下面我就給出具體代碼吧，講授如何實現。

public static void ComposeAudio(String firstAudioFilePath, String secondAudioFilePath, String composeAudioFilePath, boolean deleteSource, float firstAudioWeight, float secondAudioWeight, int audioOffset, final ComposeAudioInterface composeAudioInterface) { boolean firstAudioFinish = false; boolean secondAudioFinish = false; byte[] firstAudioByteBuffer; byte[] secondAudioByteBuffer; byte[] mp3Buffer; short resultShort; short[] outputShortArray; int index; int firstAudioReadNumber; int secondAudioReadNumber; int outputShortArrayLength; final int byteBufferSize = 1024; firstAudioByteBuffer = new byte[byteBufferSize]; secondAudioByteBuffer = new byte[byteBufferSize]; mp3Buffer = new byte[(int) (7200 + (byteBufferSize * 1.25))]; outputShortArray = new short[byteBufferSize / 2]; Handler handler = new Handler(Looper.getMainLooper()); FileInputStream firstAudioInputStream = FileFunction.GetFileInputStreamFromFile(firstAudioFilePath); FileInputStream secondAudioInputStream = FileFunction.GetFileInputStreamFromFile(secondAudioFilePath); FileOutputStream composeAudioOutputStream = FileFunction.GetFileOutputStreamFromFile(composeAudioFilePath); LameUtil.init(Constant.RecordSampleRate, Constant.LameBehaviorChannelNumber, Constant.BehaviorSampleRate, Constant.LameBehaviorBitRate, Constant.LameMp3Quality); try { while (!firstAudioFinish && !secondAudioFinish) { index = 0; if (audioOffset < 0) { secondAudioReadNumber = secondAudioInputStream.read(secondAudioByteBuffer); outputShortArrayLength = secondAudioReadNumber / 2; for (; index < outputShortArrayLength; index++) { resultShort = CommonFunction.GetShort(secondAudioByteBuffer[index * 2], secondAudioByteBuffer[index * 2 + 1], Variable.isBigEnding); outputShortArray[index] = (short) (resultShort * secondAudioWeight); } audioOffset += secondAudioReadNumber; if (secondAudioReadNumber < 0) { secondAudioFinish = true; break; } if (audioOffset >= 0) { break; } } else { firstAudioReadNumber = firstAudioInputStream.read(firstAudioByteBuffer); outputShortArrayLength = firstAudioReadNumber / 2; for (; index < outputShortArrayLength; index++) { resultShort = CommonFunction.GetShort(firstAudioByteBuffer[index * 2], firstAudioByteBuffer[index * 2 + 1], Variable.isBigEnding); outputShortArray[index] = (short) (resultShort * firstAudioWeight); } audioOffset -= firstAudioReadNumber; if (firstAudioReadNumber < 0) { firstAudioFinish = true; break; } if (audioOffset <= 0) { break; } } if (outputShortArrayLength > 0) { int encodedSize = LameUtil.encode(outputShortArray, outputShortArray, outputShortArrayLength, mp3Buffer); if (encodedSize > 0) { composeAudioOutputStream.write(mp3Buffer, 0, encodedSize); } } } handler.post(new Runnable() { @Override public void run() { if (composeAudioInterface != null) { composeAudioInterface.updateComposeProgress(20); } } }); while (!firstAudioFinish || !secondAudioFinish) { index = 0; firstAudioReadNumber = firstAudioInputStream.read(firstAudioByteBuffer); secondAudioReadNumber = secondAudioInputStream.read(secondAudioByteBuffer); int minAudioReadNumber = Math.min(firstAudioReadNumber, secondAudioReadNumber); int maxAudioReadNumber = Math.max(firstAudioReadNumber, secondAudioReadNumber); if (firstAudioReadNumber < 0) { firstAudioFinish = true; } if (secondAudioReadNumber < 0) { secondAudioFinish = true; } int halfMinAudioReadNumber = minAudioReadNumber / 2; outputShortArrayLength = maxAudioReadNumber / 2; for (; index < halfMinAudioReadNumber; index++) { resultShort = CommonFunction.WeightShort(firstAudioByteBuffer[index * 2], firstAudioByteBuffer[index * 2 + 1], secondAudioByteBuffer[index * 2], secondAudioByteBuffer[index * 2 + 1], firstAudioWeight, secondAudioWeight, Variable.isBigEnding); outputShortArray[index] = resultShort; } if (firstAudioReadNumber != secondAudioReadNumber) { if (firstAudioReadNumber > secondAudioReadNumber) { for (; index < outputShortArrayLength; index++) { resultShort = CommonFunction.GetShort(firstAudioByteBuffer[index * 2], firstAudioByteBuffer[index * 2 + 1], Variable.isBigEnding); outputShortArray[index] = (short) (resultShort * firstAudioWeight); } } else { for (; index < outputShortArrayLength; index++) { resultShort = CommonFunction.GetShort(secondAudioByteBuffer[index * 2], secondAudioByteBuffer[index * 2 + 1], Variable.isBigEnding); outputShortArray[index] = (short) (resultShort * secondAudioWeight); } } } if (outputShortArrayLength > 0) { int encodedSize = LameUtil.encode(outputShortArray, outputShortArray, outputShortArrayLength, mp3Buffer); if (encodedSize > 0) { composeAudioOutputStream.write(mp3Buffer, 0, encodedSize); } } } } catch (Exception e) { LogFunction.error("ComposeAudio異常", e); handler.post(new Runnable() { @Override public void run() { if (composeAudioInterface != null) { composeAudioInterface.composeFail(); } } }); return; } handler.post(new Runnable() { @Override public void run() { if (composeAudioInterface != null) { composeAudioInterface.updateComposeProgress(50); } } }); try { final int flushResult = LameUtil.flush(mp3Buffer); if (flushResult > 0) { composeAudioOutputStream.write(mp3Buffer, 0, flushResult); } } catch (Exception e) { LogFunction.error("釋放ComposeAudio LameUtil異常", e); } finally { try { composeAudioOutputStream.close(); } catch (Exception e) { LogFunction.error("關閉合成輸出音頻流異常", e); } LameUtil.close(); } if (deleteSource) { FileFunction.DeleteFile(firstAudioFilePath); FileFunction.DeleteFile(secondAudioFilePath); } try { firstAudioInputStream.close(); secondAudioInputStream.close(); } catch (IOException e) { LogFunction.error("關閉合成輸入音頻流異常", e); } handler.post(new Runnable() { @Override public void run() { if (composeAudioInterface != null) { composeAudioInterface.composeSuccess(); } } }); }

　　ComposeAudio方法是此次的進行合成的具體代碼實現，方法的傳入參數中firstAudioFilePath, secondAudioFilePath是用以合成的音頻文件地址，composeAudioFilePath用以指明合成后輸出的MP3文件的存儲地址，firstAudioWeight，secondAudioWeight分別用以指明合成的兩個音頻文件在合成進程中的音量權重，audioOffset用以指明第1個音頻文件相對第2個音頻文件合成進程中的數據偏移，如為負數，則合成進程中先輸出audioOffset個字節長度的第2個音頻文件數據，如為正數，則合成進程中先輸出audioOffset個字節長度的第1個音頻文件數據，audioOffset在另外一程度上也代表著時間的偏移，目前我們合成的兩個音頻文件參數為16位單通道44.1khz采樣率，那末audioOffset如果為1*16/8*1*44100=88200字節，那末終究合成出的MP3文件中會先播放1s的第1個音頻文件的音頻接著再播放兩個音頻文件加和的音頻。
　　整體合成代碼是很清晰的，由于加入了時間偏移，所以合成進程中是有可能有1個文件先輸出完的，在代碼中針對性的進行處理便可，固然即便沒有時間偏移也是可能出現類似情況的，比如音樂時長2分鐘，錄音3分鐘，音樂輸出結束后那就只應當輸出錄音音頻了，另外在代碼中將PCM數據編碼為MP3文件使用了LAME的MP3編碼庫，除此之外代碼中就沒有比較復雜的模塊了。