Home » Technology » How to fix the input type and weight type mismatch error in the Hugging Face Transformer model

How to fix the input type and weight type mismatch error in the Hugging Face Transformer model

Image Source

We have already introduced the Transformer model platform “[Hugging Face]Ep.1 An AI platform that ordinary people can play”, and I believe that many players will encounter such a situation during the operation process, so we will sort out the problems encountered and share solutions, so that friends in need can refer to them.

question

Input type (torch.FloatTensor) and weight type (torch.cuda.FloatTensor) should be the same or input should be a MKLDNN tensor and weight is a dense tensor

prospect feed

The story goes like this, Xiao Ming is a software engineer who specializes in technology in the field of speech recognition.wav2vec2When recognizing the voice recognition model of , an error occurred at a critical moment, and this error may also be encountered by other people, so I decided to organize the process to help partners on the same technical road to break through the difficulties together.

Image Source…

First Xiao Ming usedwav2vec2The speech recognition model of , and load the Chinese model “wav2vec2-large-xlsr-53-chinese-zh-cn-gpt“, and expect to use the GPU to accelerate the recognition speed, so set DEVICE to cuda.

from transformers import Wav2Vec2ForCTC, Wav2Vec2Processor
SRC_MODEL = ‘ydshieh/wav2vec2-large-xlsr-53-chinese-zh-cn-gpt’
DEVICE = ‘cuda’
processor = Wav2Vec2Processor.from_pretrained(SRC_MODEL)
model = Wav2Vec2ForCTC.from_pretrained(SRC_MODEL).to(DEVICE)

Then the audio file is directly identified.

audio_buffer, _ = sf.read(‘test.wav’)

input_values = processor(audio_buffer, sampling_rate=16000, return_tensors=”pt”).input_values

logits = model(input_values).logits

predicted_ids = torch.argmax(logits, dim=-1)

transcription = processor.decode(predicted_ids[0])

transcription

It turned out that something went wrong… What should I do?

RuntimeError: Input type (torch.FloatTensor) and weight type (torch.cuda.FloatTensor) …

reason

According to the error message, it is estimated that the input type is CPU (torch.FloatTensor), but the model type is GPU (torch.cuda.FloatTensor), so the data source needs to be converted to the type of GPU to match.

How to solve

We try to convert audio data into data of type “torch.cuda.FloatTensor”.

input_values = input_values.to(DEVICE)

In this way, the data types of the model and the data will be consistent. After all, the GPU and the CPU are not compatible, so be very careful when performing calculations…

Leave a Comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.