Speedup NumPy through the new TF NumPy API

4 min readMay 24, 2021

TensorFlow team in the recent Google I/O 2021 event has announced NumPy API. Kemal El Moujahid, the product director for TensorFlow and Machine Learning at Google stated, “… In research, for example, we heard that many users prefer to directly use NumPy. The experimental tf.numpy, which is part of our latest 2.5 release, combines the simplicity of the NumPy API with the power of TensorFlow.”

Numpy is a library for the Python programming language, adding support for large, multi-dimensional arrays and matrices, along with a large collection of high-level mathematical functions to operate on these arrays. Anyone in the field of data science or does code on Python would know NumPy, and it is helpful with vectors, matrices, element wise operations, math functions and many more oprations. Images, audio, video, text or data from csv files are read as arrays in most scenarios before pre-processing. However, there isn’t any direct support to numpy to run the code with GPU or TPU accelerators.

GPU support to NumPy can be enabled through ‘Numba’ — an open-source JIT compiler that translates a subset of Python and NumPy into fast machine code. Though the learning curve with Numba isn’t steep, it is one more library to add for code execution and while deploying. Also, Numba has limitations of compiling individual functions, lacks support to lists and dictionaries with mixed type and support of Numba being limited to few methods of NumPy.

Let us explore TF NumPy API through a small snippet of code and compare the same with NumPy. To understand and compart the time between TF NumPy and NumPy better, this snippet is run in loop for 100 times.

### The code snippet computes sigmoid of dot product of two matrices with shapes [256, 1024] and [1024, 1024] 

inp = np.random.rand(256,1024) 
wt = np.random.rand(1024, 1024) / 3 
y = np.dot(inp, wt) 
logit = 1. / (1. + np.exp(-y))

Running the code taking code snippet of NumPy in colab, with runtime type as ‘CPU’, the execution took 4.8208 seconds. The same code, with runtime type as ‘GPU’ displayed same result, with 4.8208 seconds

import numpy as np 
import time


def numpy_library():
  begin = time.time()
  for i in range(100):
    inp = np.random.rand(256,1024)
    wt = np.random.rand(1024, 1024) / 3
    y = np.dot(inp, wt)
    logit = 1. / (1. + np.exp(-y))
  end = time.time() 
  return end-begin

print("Time taken through numpy_library is " + str(numpy_library()))

Time taken through numpy_library is 4.8208

Working with TF NumPy API:

NumPy is available as tf.experimental.numpy, allowing GPU acceleration by TensorFlow, while also allowing access to all TF APIs. The library is imported as tnp and is available on TF version ≥ 2.5.0 The TF NumPy features are easily accessible on Colab. You may access the code clicking on this link.

import tensorflow.experimental.numpy as tnp
import time 


def numpy_tensorflow():
  begin = time.time()
  for i in range(100):
    inp = tnp.random.rand(256,1024)
    wt = tnp.random.rand(1024, 1024) / 3
    y = tnp.dot(inp, wt)
    logit = 1. / (1. + tnp.exp(-y))
  end = time.time() 
  return end-begin
print("Time taken through Tensorflow NumPy API is " + str(numpy_tensorflow()))

Time taken through TensorFlow NumPy API is 0.2473

The CPU Execution on TF NumPy API gave no difference in execution time, when compared with NumPy. Also, the first execution takes comparatively more time, as it involves compiling.

The computation time depends on GPU Device, which in this scenario is Tesla K80

Features mentioned on TensorFlow NumPy API:

a. TensorFlow NumPy ND array

Instance of tf.experimental.numpy.ndarray represents multidimentional dense array of given datatype. Other methods in this instance include ndarray.T, ndarray.reshape, ndarray.ravel and others.

ones = tnp.ones([5, 3], dtype=tnp.float32) 
print("Created ndarray: \n" + str(ones))
print("\n Created ND array with shape = %s, rank = %s, dtype = %s\n" % 
(ones.shape, ones.ndim, ones.dtype))
 
print("Transpose of the matrix is: \n" + str(ones.T))

b. Broadcasting

The way tensors broadcast data in matrices, TF NumPy does support broadcasting. If we dive deep, TF NumPy is an alias to tf.tensor, thereby allowing Type promotion at ease.

x = tnp.ones([3, 3])
y = tnp.ones([3])
z = x + y 

print("Value of Z is: \n " + str(z) )

c. NumPy interoperability

TensorFlow ND arrays can interoperate with NumPy functions. One of the key functionalities, helping in connecting numpy features with TF NumPy. I felt this feature helps with Speed and Reliability.

np_sum = np.sum(tnp.ones([3, 5])) 
print("sum = %s. Class: %s" % (float(np_sum), np_sum.__class__))

There are a few tutorials mentioned in TensorFlow page, on applying TF NumPy on sentiment analysis, distributed image classification and few other real time applications reducing computation time drastically. We shall take them up in the next blog. You can check further details on TensorFlow official webpage — https://www.tensorflow.org/guide/tf_numpy

What are your views on TensorFlow NumPy API? Would you switch from existing NumPy when the feature is production ready? Comment below…