Intel Vtune Profiler Command Line Interface (CLI) : Hotspots profiling for serial C code

In this post we will look at steps involved in profiling of the serial code using Intel Vtune Profiler using command line interface.

Let’s take a simple code which performs some array operations.

#include<stdio.h>
#include<stdlib.h>

#define ARRSIZE 99999999

void print_data(int *C)
{
	int i;
	//print the data
	printf("\tC Array values : ");
	for(i=0;i<ARRSIZE;i++)
	{
		printf("\t%d\n", C[i]);	
	}
}

void initialize(int *A, int *B, int *C)
{
	int i;
	//Initialize data to some value
	for(i=0;i<ARRSIZE;i++)
	{
		A[i] = 2*(i+1);
		B[i] = i+1;
	}
	//print_data(C);
}

void add_arrays(int *A, int *B, int *C)
{
	int i;
	for(i=0;i<ARRSIZE;i++)
	{
		C[i] = A[i] + B[i];
	}
	//print_data(C);
}

void sub_arrays(int *A, int *B, int *C)
{
	int i;
	for(i=0;i<ARRSIZE;i++)
	{
		C[i] = A[i] - B[i];
	}
	//print_data(C);
}

void mul_arrays(int *A, int *B, int *C)
{
	int i;
	for(i=0;i<ARRSIZE;i++)
	{
		C[i] = A[i] * B[i];
	}
	//print_data(C);
}

void div_arrays(int *A, int *B, int *C)
{
	int i;
	for(i=0;i<ARRSIZE;i++)
	{
		C[i] = A[i] / B[i];
	}
	//print_data(C);
}

int main(int argc, char **argv)
{
	int myid, size;
	int i;
	int *A, *B, *C;	
	
	//Allocate and initialize the arrays
	A = (int *)malloc(ARRSIZE*sizeof(int));
	B = (int *)malloc(ARRSIZE*sizeof(int));
	C = (int *)malloc(ARRSIZE*sizeof(int));
	
	initialize(A,B,C);
	
	//print the data
	//printf("\nInitial data: \n");
	//for(i=0;i<ARRSIZE;i++)
	//{
	//	printf("\t%d \t %d\n", A[i], B[i]);
	//}	
	
	add_arrays(A,B,C);
	
	sub_arrays(A,B,C);
	
	mul_arrays(A,B,C);
	
	div_arrays(A,B,C);
	
	printf("\nProgram exit!\n");
	
	//Free arrays
	free(A); 
	free(B);
	free(C);
}

To Profile this code, we need to first compile this code. We can compile this code using GNU compiler or Intel Compiler.

  • Compiling code using GNU compiler-
gcc -g mycode.c

OR

  • Compiling code using Intel compiler
icx -g mycode.c 

To Profile the code using Intel Vtune Profiler, we can use following command –

vtune -collect hotspots -result-dir r000hs ./a.out

This command consists of the following components

  • vtune : Command to invoke the Intel Vtune Profiler
  • -collect hotspots : This option specifies the analysis you want to perform. In this case, it’s “hotspots” analysis, that identifies the most CPU time-consuming sections of your code. There are other analysis types available – performance-snapshot, anomaly-detection, memory-consumption, uarch-exploration, memory-access, threading, hpc-performance, io, exp-offload, gpu-offload, gpu-hotspots, fpga-interaction, npu, system-overview, graphics-rendering, tsx-exploration, tsx-hotspots, sgx-hotspots, aps
  • -result-dir r000hs : This option sets the directory where the results will be stored. Here, the results will be saved in the directory named “r000hs”. We can use any name here.
  • ./a.out : This denotes the name of the our program. We can specify the actual name of the executable here.

Once the profiling is complete, we can download (in case we are using a remote system) the results on our workstation and view it in GUI

vtune-gui ./r000hs/

Vtune’s GUI will open the profiling result in the directory “./r000hs/”. A sample screenshot is provided here.

This screenshot shows the summary of the profiling. If we click on the “Bottom-up” tab at the top, we should be able to see following output –

It lists different functions and their execution timings. The functions are listed in descending order.

Based on this list, we can target the most time consuming functions for parallelization and optimization.

If you are interested in more articles on profiling you can find them on this page.