Accelerate Text Generation with LSTM Using Intel® Extension for TensorFlow*

Get the Latest on All Things CODE



Text generation is a task where an AI system generates written content or text resembling human-written text. The process involves generating new text based on given prompts by training AI models on large datasets of text to learn patterns and styles. It is useful for various real-world applications such as content creation, chatbots, and text summarization. In general, text generation can be addressed with Markov processes or deep generative models like long short-term memory (LSTM).

In this article, we present a code sample on how to train your model faster for text generation with LSTM by using Intel® Extension for TensorFlow*.

About LSTM

The LSTM neural network first proposed in Hochreiter and Schmidhuber (1997) is an artificial neural network that contains LSTM cells as neurons in some of its layers. Each LSTM layer will contain many LSTM cells. An LSTM cell is a processing unit that allows sequential data processing and keeps its hidden state (short-term memory) saved through time. Every LSTM cell looks at its own input column and the previous column's cell output. In LSTM, there are several defined gates determining the influence of the previous layer to the current one and the information that is important enough to be passed through the network. The logic behind all the gates in a single LSTM cell is:

  • Forget gate—Determines influence from the previous cell (and its time stamp) on the current one. It uses the sigmoid layer (the result is between 0.0 and 1.0) to decide whether it should be forgotten or remembered.
  • Input gate—The cell tries to learn from the input and passes the information to this cell. It does that with a dot product of the sigmoid unit (from the forget gate) and tanh unit.
  • Output gate—Performs another dot product of the tanh and sigmoid unit. Updated information is passed from the current to the next time stamp.

According to the cell's name, there is both long-term and short-term memory. Passing a cell's state (all information) at time stamps is related to long-term memory. Time stamps take information from previous cells and propagate it through the network. The hidden state of the cell corresponds to short-term memory, as the information is only related to the current cell.

How to Optimize an LSTM Model with Intel Extension for TensorFlow

Intel Extension for TensorFlow is a heterogeneous, high-performance, deep learning extension plug-in. It is based on the TensorFlow Pluggable Device interface to bring Intel hardware accelerators into the TensorFlow open source community for AI workload acceleration, and it up-streams several optimizations into open source TensorFlow. Learn how to install it as a stand-alone tool or get it as part of AI Tools.

Intel Extension for TensorFlow offers operator overrides for some of the Keras layers. Both Intel Extension for TensorFlow and stock TensorFlow are semantically the same. The use of the Intel Extension for TensorFlow operator is very simple. While creating a model, we need to use itex.ops.ItexLSTM instead of tf.keras.layers.LSTM as shown in the following example.


Intel Extension for TensorFlow

>>> inputs = tf.random.normal([32, 10, 8])

>>> lstm = tf.keras.layers.LSTM(4)

>>> output = lstm(inputs)
>>> print(output.shape)
(32, 4)
>>> inputs = tf.random.normal([32, 10, 8])

>>> lstm = itex.ops.ItexLSTM(4)  

>>> output = lstm(inputs) >>> print(output.shape) (32, 4)


Based on available runtime hardware and constraints, the Intel Extension for TensorFlow LSTM layer chooses different implementations (based on Intel Extension for TensorFlow or stock TensorFlow) to maximize performance.

Code Implementation

The code sample shows how to train the model for text generation using LSTM and Intel Extension for TensorFlow on Intel GPUs. The main goal of the text generation model is to predict the probability of the next word in a sequence when the previous words are given as input. The code sample also highlights the crucial steps required for transitioning the existing script (model training with LSTM) to Intel hardware using Intel Extension for TensorFlow.

  1. Download the dataset. We are using The Republic by Plato from Project Gutenberg*.
    import string
    import requests
    response = requests.get('')
    data = response.text.split('\n')
    data = " ".join(data)
  1. Prepare the data. Clean the text and update the training data width by reducing the number of words, as we are using longer text.

    def clean_text(doc):
        tokens = doc.split()
        table = str.maketrans('', '', string.punctuation)
        tokens = [(w.translate(table)) for w in tokens] # list without punctuations
        tokens = [word for word in tokens if word.isalpha()] # remove alphanumeric special characters
        tokens = [word.lower() for word in tokens]
        return tokens
    tokens = clean_text(data)
    def get_aligned_training_data(text_tokens, train_data_width):
        length = train_data_width + 1
        lines = []
        for i in range(length, len(text_tokens)):
            seq = text_tokens[i - length:i]
            line = ' '.join(seq)
        return lines
    lines = get_aligned_training_data(tokens, 50)
  2. Check available devices. Take advantage of Intel GPUs for model training. Make sure that the device is available for TensorFlow.

    import tensorflow as tf
    xpus = tf.config.list_physical_devices()print(xpus)
  3. Prepare the tokenization function. The text data needs to be tokenized for the training. This means that every word gets the index assign.

    # Tokenization
    import numpy as np
    from tensorflow.keras.preprocessing.text import Tokenizer
    from tensorflow.keras.utils import to_categorical
    # Keras layers
    from tensorflow.keras.layers import Embedding, Dense
    from tensorflow.keras.models import Sequential
    def tokenize_prepare_dataset(lines):
        tokenizer = Tokenizer()
        # Get vocabulary size of our model
        vocab_size = len(tokenizer.word_index) + 1
        sequences = tokenizer.texts_to_sequences(lines)
        # Convert to numpy matrix
        sequences = np.array(sequences)
        x, y = sequences[:, :-1], sequences[:, -1]
        y = to_categorical(y, num_classes=vocab_size)
        return x, y, tokenizer
    x, y, itex_tokenizer = tokenize_prepare_dataset(lines)
    seq_length = x.shape[1]
    vocab_size = y.shape[1]
  4. Create the model with the Intel Extension for TensorFlow LSTM operator. After creating the model, add an embedding layer and optimized LSTM layer from Intel Extension for TensorFlow.

    import intel_extension_for_tensorflow as itex
    neuron_coef = 4
    itex_lstm_model = Sequential()
    itex_lstm_model.add(Embedding(input_dim=vocab_size, output_dim=seq_length, input_length=seq_length))
    itex_lstm_model.add(itex.ops.ItexLSTM(seq_length * neuron_coef, return_sequences=True))
    itex_lstm_model.add(itex.ops.ItexLSTM(seq_length * neuron_coef))
    itex_lstm_model.add(Dense(units=seq_length * neuron_coef, activation='relu'))
    itex_lstm_model.add(Dense(units=vocab_size, activation='softmax'))
    itex_lstm_model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
  5. Train the model. We are using batch size 256 and training over 200 epochs.,y, batch_size=256, epochs=200)
  6. Create a Keras* LSTM model for comparison.

    from tensorflow.keras.layers import LSTM
    # Reducing the sequence to 10 compared to 50 with Itex LSTMlines = get_aligned_training_data(tokens, 10)
    # Tokenization
    x, y, keras_tokenizer = tokenize_prepare_dataset(lines)
    seq_length = x.shape[1]
    vocab_size = y.shape[1]
    neuron_coef = 1
    keras_lstm_model = Sequential()
    keras_lstm_model.add(Embedding(input_dim=vocab_size, output_dim=seq_length, input_length=seq_length))
    keras_lstm_model.add(LSTM(seq_length * neuron_coef, return_sequences=True))
    keras_lstm_model.add(LSTM(seq_length * neuron_coef))
    keras_lstm_model.add(Dense(units=seq_length * neuron_coef, activation='relu'))
    keras_lstm_model.add(Dense(units=vocab_size, activation='softmax'))
    keras_lstm_model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy']),y, batch_size=256, epochs=20)
  7. Generate text based on given input. We use trained models to generate sentences based on the given input (seed text). You can input your own line, but for the best results, take the input line from the text. To do this, we create a custom function for text generation.

    from tensorflow.keras.preprocessing.sequence import pad_sequences
    def generate_text_seq(model, tokenizer, text_seq_length, seed_text, generated_words_count):
        text = []
        input_text = seed_text
        for _ in range(generated_words_count):
            encoded = tokenizer.texts_to_sequences([input_text])[0]
            encoded = pad_sequences([encoded], maxlen = text_seq_length, truncating = 'pre')
            y_predict=np.argmax(predict_x, axis=1)
            predicted_word = ''
            for word, index in tokenizer.word_index.items():
                if index == y_predict:
                    predicted_word = word
            input_text += ' ' + predicted_word
        return ' '.join(text)
    import random
    random_index = random.randint(0, len(lines))
    random_seed_text = lines[random_index]
    number_of_words_to_generate = 10
    generated_text = generate_text_seq(itex_lstm_model, itex_tokenizer, 50, random_seed_text, number_of_words_to_generate)

This code sample showcases how to implement an LSTM model for text generation. From the output, you can observe the improvement in training performance with Intel Extension for TensorFlow on Intel GPUs. Try out the code sample on Linux* and Jupyter* Notebook.

Next Steps

Use the most up-to-date Intel software and hardware optimizations for TensorFlow to speed up the training and inference performance on Intel hardware. To jump-start your AI workloads, see Intel Extension for TensorFlow.

We encourage you to also check out and incorporate Intel’s other AI and machine learning framework optimizations and end-to-end portfolio of tools into your AI workflow. Learn about the unified, open, standards-based oneAPI programming model that forms the foundation of Intel® AI Portfolio to help you prepare, build, deploy, and scale your AI solutions.


Related TensorFlow Articles