

Contents lists available at ScienceDirect

# INTEGRATION, the VLSI journal



journal homepage: www.elsevier.com/locate/vlsi

# Performance evaluation metrics for ring-oscillator-based temperature sensors on FPGAs: A quality factor



Navid Rahmanikia<sup>a</sup>, Amirali Amiri<sup>b</sup>, Hamid Noori<sup>a,\*</sup>, Farhad Mehdipour<sup>c</sup>

<sup>a</sup> Faculty of Engineering, Ferdowsi University of Mashhad, Mashhad, Iran

<sup>b</sup> Department of Informatics, Technical University of Munich, Munich, Germany

<sup>c</sup> E-JUST Center, Graduate School of Information Science and Electrical Eng., Kyushu University, Fukuoka, Japan

# A R T I C L E I N F O

Keywords: Quality factor Performance evaluation metrics Ring oscillator Temperature sensor Design space exploration Field programmable gate array (FPGA)

# ABSTRACT

Due to the aggressive scaling of process technologies, ultra-logic densities on integrated circuits, and also the complexity of designs, which in turn lead to a drastic increase of power density, thermal issues have become a bottleneck in electronics designs, result in a new focus for much researches over last years. To address this issue, various dynamic thermal management (DTM) techniques have been proposed to maintain the operation of systems safe and reliable. To effectively apply DTM techniques, providing a precise and reliable network of temperature sensors is highly required to measure local temperatures and provide an accurate thermal map of the chip. While many designers have utilized ring-oscillator (RO) circuits as temperature sensors in a network for measuring the thermal distribution and predicting the thermal behavior of a field programmable gate array (FPGA), a high-level study of the temperature sensors' design space is still missing. In this paper, a novel concept of the RO-based temperature sensor based on four basic evaluation metrics is presented. We introduce four useful evaluation metrics (i.e. the area, thermal, and power overheads, and thermal map error) and some measurement methods for exploring and comparing the relative performance of different temperature sensor's designs. Then, in order to make an optimal choice based on the metrics obtained, we propose a figure of merit (FOM) to characterize the efficiency of these designs. The proposed performance evaluation metric, the quality factor (QF), is based on the overheads and measurement accuracy trade-offs between different designs of the RO-based temperature sensor. Consequently, the proposed QF metric is a quantity value representing a measure of effectiveness, efficiency, and performance of a temperature sensor network, which can help the designer to make a proper decision. Moreover, in this work, a compact and ultra-sensitive RO-based temperature sensor is presented that utilizes only 5 look-up tables (LUTs), occupies 37.5% fewer resources than the most compact sensor, and provides 2.72 times higher sensitivity than the best sensitive design. Also, in this paper, several designs of the RO-based temperature sensor are explored in a network, in terms of various sensor's configurations, RO length, and counter width, and compared with each other in order to investigate their influences on the efficiency of the sensor network. According to the QF metric and experimental results, the sensor network based on the proposed sensor has the best efficiency among other alternative designs.

# 1. Introduction

Nowadays, smaller and faster are well-known rules of very large scale integration (VLSI) chips because they eventuate higher performance. However, higher performance results in higher power densities, and hence, higher operating temperatures. Another negative impact of technology node scaling is that leakage current, and hence, static power are steadily getting worse in today's field programmable gate array (FPGA) chips because smaller transistors make it easier for current to leak. Leakage current increases exponentially with temperature. Unfortunately, as leakage current increases, the junction temperature increases even further, causing a positive feedback loop between leakage current and junction temperature, results in hotspots, which have much higher temperatures compared to the average die temperature, hence rising local temperature of the chip. The rising temperature has negative effects on reliability, timing uncertainty, static power, and mean time to failure (MTTF) of high-performance VLSI chips such as modern FPGAs. Therefore, high-end FPGAs suffer from thermal constraints. To address this problem, applying dynamic thermal management (DTM) techniques play a vital role for run-time prophy-

\* Corresponding author.

http://dx.doi.org/10.1016/j.vlsi.2016.12.007 Received 24 December 2015: Received in revis

Received 24 December 2015; Received in revised form 16 November 2016; Accepted 3 December 2016 Available online 09 December 2016 0167-9260/ © 2016 Elsevier B.V. All rights reserved.

*E-mail addresses:* navid.rahmanikia@alumni.um.ac.ir (N. Rahmanikia), amirali.amiri@tum.de (A. Amiri), hnoori@um.ac.ir (H. Noori), farhad@ejust.kyushu-u.ac.jp (F. Mehdipour).

lactic proceedings. Effective DTM techniques rely on providing an accurate thermal map in order to measure the temperature distribution and predict the thermal behavior of the chip at run-time [1]. The authors in [2] employ RO-based temperature sensors to implement a thermal management system test bench that studies eight different DTM methods. The thermal management test bench contains four emulated cores, each of which contains eight 8085 microprocessors. They apply the predictive and reactive hybrid DTM techniques on the Cyclone IV FPGA emulated multiprocessor system-on-chip (MPSoC) and report 28% and 41% improvement in the number of temperatures exceeded 50 °C, compared to without DTM method, respectively.

Different approaches are available to measure the temperature of reconfigurable fabrics. One is to use a built-in thermal diode or embedded analog temperature sensor, also known as physical sensor or hard-sensor, which has been provided on some FPGAs, such as Xilinx Virtex-5 [3] and Altera Stratix IV [4]. However, because these FPGAs have only one sensor with fixed location (usually in the center of the die), it can get the temperature at one point in the die. Therefore, it would be impossible to measure each local temperature and analyze the thermal distribution of the FPGA. Moreover, since hotspots are fundamentally design-dependent and occur locally in FPGA-based designs, it is essential to embed an array of thermal sensors in the design in order to measure its local temperature and detect hotspots on-line for run-time prophylactic proceedings. Unfortunately, due to design-dependency of hotspots, embedding multiple hard-sensors is not cost-effective at the pre-manufacturing stage. Another approach is to use of infrared cameras, which are external devices capturing the infrared radiation coming out of chips to form an image. But, this method is not very flexible because infrared cameras are expensive and the image is affected by not only the inner temperature of the chip but also the package surface. Moreover, for applying DTM techniques, providing a feedback from these cameras to FPGAs is very difficult.

In the recent years, soft-sensors are given more consideration by researchers in order to monitor the thermal distribution of the FPGA, avoiding using external devices. Many researchers utilize an array of temperature sensors based on a ring-oscillator (RO), the oscillation frequency of which is sensitive to temperature, for monitoring the thermal distribution of FPGAs. Unlike previous methods, RO-based temperature sensors are reconfigurable, generated dynamically [5], instantiated as many as necessary, placed wherever needed on FPGAs and can be implemented using native resources (i.e. configurable logic blocks) [6–9]. As a result, instantiating soft-sensors at the postmanufacturing stage is highly beneficial.

While numerous publications like [5-15] have presented different RO-based temperature sensors' designs, which impress on the sensor network's efficiency, a high-level exploration of sensors' design space is still missing. To address this issue, we first present four basic evaluation metrics, in terms of area, thermal, and power overheads and thermal map error, which are useful for comparing the relative performance of different temperature sensors' designs. Then, a performance evaluation metric, the quality factor (QF), is proposed and defined for evaluating the efficiency of different RO-based sensor designs when basic evaluation metrics are considered. Using the OF metric, it is possible to characterize each sensor network quantitatively and investigate the sensor designs' influences. The main goal of proposing the QF metric is to establish one single metric that integrates various criteria in order get one single score for evaluating a set of possible designs of the sensor with respect to different metrics that are usually in conflict with each other, similar to the metrics such as energy-delay product (EDP). Obviously, there can exist different versions for QF metric depending on which metric is more important. In order to achieve this goal, first of all, we need to gather useful evaluation metrics to provide enough information in order to compare designs against each another. After obtaining the metrics, design space exploration (DSE) can help designers to find the optimal sensor network through different design candidates according to the final goal which is maximizing QF value. Moreover, in this work, a compact and ultra-sensitive RO-based temperature sensor is introduced to meet the destructive effect of voltage scaling, i.e. sensor's sensitivity degradation. The specific contributions of the presented work are as follows:

- (a) Four basic evaluation metrics are formulated and used for evaluating and comparing the relative performance of various temperature sensor designs.
- (b) The QF metric is introduced to characterize different designs of temperature sensors in a network when basic evaluation metrics are considered.
- (c) Twelve sensor networks with all possible sensor configurations are implemented and compared in terms of proposed evaluation metrics.
- (d) The influences of the RO length (i.e. from 3 to 31 stages) and the counter width are investigated on the efficiency of the sensor network in order to find the most efficient design.
- (e) An ultra-compact and -sensitive RO-based temperature sensor is proposed.

To the best of our knowledge, this is the first paper that defines performance evaluation metrics for evaluating, comparing, and characterizing the efficiency of temperature sensor networks on FPGAs and explores them considering the proposed QF metric. The rest of this paper is organized as follows. Section 2 describes the background of the RO-based temperature sensor. In Section 3, the related work is presented. The proposed RO-based temperature sensor is introduced in Section 4. In Section 5, four basic evaluation metrics and some measurement methods are presented for comparing the relative performance of different sensors' designs. Section 6 describes and defines the proposed QF metric for temperature sensors. The system setup and experimental evaluation results are presented and discussed in Section 7. Finally, Section 8 concludes the paper and presents the future work.

# 2. Background

A well-known rule of VLSI chips is that the propagation delay of a circuit increases as junction temperature increases which results in lower frequencies. Hence, a method to measure the die temperature is to construct an RO and calibrate its output in Hz/°C, originally has been proposed by [16]. An RO-based temperature sensor is composed of an RO, where conventionally an odd number of inverters is connected to each other in a loop to form a ring, and a counter, also known as frequency counter or capture counter, which captures RO's oscillations at a fixed time interval, called sample period, as depicted in Fig. 1. When the number of inverters is odd in the RO chain, the RO output is unstable and toggles between "0" and "1". To reduce destructive effects of oscillations, such as self-heating and counter overflow, the RO is controlled and gated by a logical AND gate. As the oscillation frequency of the RO is temperature-dependent, the number of oscillations captured within the fixed amount of time is logged to translate into operating temperature, called sensor calibration [8]. The frequency of an RO is related to the total delay of logic elements and interconnects in the loop. The increase of temperature increases the total delay of the RO chain. Hence, in an RO-based temperature sensor, higher operating temperature decreases the RO's oscillation frequency,



Fig. 1. Conventional RO-based temperature sensor.

and consequently, the counter has a smaller value, and vice versa.

### 2.1. Theory of operation

The number of inverters and the interconnect delay define the toggling frequency. The oscillation frequency of an RO is given by Eq. (1):

$$f = \frac{1}{t_P} = \frac{1}{2 \times n \times t_D} \tag{1}$$

where  $t_p$  is the oscillation period,  $t_D$  is the sum of the propagation delay of one inverter  $(t_{pd})$  and its interconnect delay to the next logic stage  $(t_{conn})$  and is calculated using Eq. (2), n is the number of inverters in the chain, and f is the RO's oscillation frequency.

$$t_D = t_{pd} + t_{conn} \tag{2}$$

Assuming that the threshold voltage and the transconductance for both nMOS and pMOS transistors are equal, the propagation delay of an inverter is calculated using Eq. (3)[17]:

$$t_{pd} = \frac{(L/W)C_L}{\mu C_{ox}(V_{DD} - V_T)} \ln(\frac{1.5V_{DD} - 2V_T}{0.5V_{DD}})$$
(3)

where  $V_{DD}$  is the power supply voltage,  $C_L$  is the load capacitance of the inverter, L and W are the length and width of transistors, respectively, and  $\mu$  and  $V_T$  are the carrier mobility and the threshold voltage of transistors which are given by Eqs. (4) and (5), respectively [18]:

$$\mu = \mu_0 (\frac{T}{T_0})^{km} \qquad ; \quad km = -1.2 \sim -2 \tag{4}$$

$$V_T(T) = V_T(T_0) + \alpha(T - T_0) \; ; \; \; \alpha = -0.5 \sim -3 \frac{m\nu}{k}$$
(5)

where *T* is the absolute temperature,  $T_O$  is the nominal temperature, and  $\mu_O$  is the carrier mobility at this temperature. Since  $k_m$  and  $\alpha$  are negative constants, both  $\mu$  and  $V_T$  are decreased as the junction temperature increases. If  $V_{DD} > > V_T$ , the thermal effect of the propagation delay is dominated by the carrier mobility in the denominator of Eq. (3), and hence, the thermal coefficient of the propagation delay becomes positive. So, an increase of temperature strongly increases the propagation delay of inverters. However, as stated in Section 4, in the modern CMOS process technologies, the case where  $V_{DD} > > V_T$  is no longer true, hence, the temperature dependence of the propagation delay of an inverter is now very weak. In addition, the electrical resistance of the interconnect  $R_E$  has a linear relationship with its temperature as Eq. (6)[19]:

$$R_E(x) = R_0(1 + \beta, T(x))$$
(6)

where  $R_O$  is the resistance per unit length at a nominal temperature,  $\beta$  is the temperature coefficient of resistance, and T(x) is the temperature profile along the length of the interconnect line. According to (6), an increase of temperature increases the interconnect delay due to adding an extra distributed *RC* delay to the Elmore delay. Consequently, in an RO, the higher the junction temperature is, the lower the oscillation frequency, and vice versa.

# 3. Related work

The related work of this research can be grouped into three main following categories: 1) sensor calibration, 2) sensor designs, and 3) temperature sensor networks.

#### 3.1. Sensor calibration

Since the oscillation frequency of an RO is temperature-dependent, the output of the sensor needs to be calibrated to link the frequency

response of the RO to the temperature of its surroundings. Due to process variation and nondeterministic routing algorithms in modern FPGAs, in an array of temperature sensors, each sensor must be calibrated separately. The main approaches of calibration methods are described in the literature. One approach is based on the built-in thermal diode [6,13,20] that is integrated into some devices and can be accessed inside the FPGA using the dedicated Xilinx system monitor and Altera temperature sensor intellectual property (IP) cores [3,21]. Another method relies on external devices such as climate chamber and temperature-controlled oven [7,9,22,23]. The temperature measured by the temperature-controlled oven is used as the reference temperature for the sensor calibration process. The limitation on the temperature ranges of sensors is the major drawback of the latest approach because modern FPGAs can operate in wide range of temperatures (e.g. up to 120 °C), but the evaluation boards are suffered and may be failed in these operating temperatures. To address this issue, in [24] a systematic study of heat-generating cores is performed and then seven ways are introduced to generate heat on modern FPGAs by utilizing different available resources of the device. More recently, Weber et al. [8] present a calibration effort for RO-based temperature sensors in FPGAs that employs a mixed approach to overcome the intra-sensor variation.

# 3.2. Sensor designs

The RO-based temperature sensor is composed of two main components: 1) the RO and 2) the counter, while each component may have different designs. In a conventional design, the designers use the binary counter as a capture counter and the series of inverters (i.e. NOT gates) as a delay line. For instance, Velusamy et al. [25] use an RO comprising 7 inverters and a capture counter to design a sensor. The work in [12] utilizes a sequence of 3 inverters in a chain to study the RO behavior in 1.0 V low-voltage core FPGA. In [15], the authors design a temperature sensor for regulating a Thermopile Peltier cooler. Unlike a conventional structure, they utilize 109 XORCY primitive cells instead of NOT gates in order to construct a delay line. The XORCY is a special XOR for carry-chain logic functions and available on the configurable logic block (CLB) of the Xilinx FPGAs [26]. Zick and Hayes [7] present a compact temperature sensor composed of the 3stage RO and the residue number system (RNS) ring counter, instead of the binary counter, to make the sensor compact because, in a sensor network in which has a considerable number of sensors, binary counters incur large overheads. Also, an open latch along with each inverter is instantiated to increase the sensitivity of the sensor against temperature variation [7].

#### 3.3. Temperature sensor networks

Many researchers like [6-11] utilize a network of RO-based temperature sensors to measure the thermal distribution of FPGAbased systems. Lopez-Buedo and Boemo [10] create a grid-based temperature sensor network and allocate an array of 4×8 sensors to monitor the thermal behavior of a Virtex FPGA. Also, Lopez-Buedo et al. [5] present a sensor network that can be dynamically inserted, operated, and eliminated from the system using run-time reconfiguration. Their sensor is composed of an RO constructed using 7 inverters, a 14-bit capture counter, and a time base counter to control all of the sensor activity. Velusamy et al. [25] first present the design of five temperature sensors that monitors the local temperatures of the FPGA chip and then validate the temperatures obtained from the RO-based temperature sensors with values obtained from HotSpot, an accurate and fast thermal modeling simulator. In [7], 112 sensors are arranged on a hexagonal grid of size 16×7 and some approaches are presented for sensing variations in delay, temperature, switching-induced IR drop, and leakage-induced IR drop in a Xilinx Virtex-5 FPGA.

The work in [8] presents a configurable toolset for analyzing the

thermal behavior of FPGA chips. The toolset can insert RO-based temperature sensors and heat-generator core into the design and control them, inspect the floorplan of a given project, and communicate with a personal computer (PC). The authors use a 7-stage RO and a 16bit capture counter to design a temperature sensor and then construct a grid-based network of 15×10 temperature sensors in order to analyze the thermal behavior of the Xilinx Spartan-3E and Spartan-6 FPGAs. Recently, in [9], a regular grid-based network of 4×6 sensors is utilized to provide a thermal map of an Altera Cyclone III FPGA. Then, it is compared with the image of the chip captured by the infrared camera. They show that there is a good match between the thermal map generated by the sensor network and the infrared camera with a small offset due to effects of the package surface. Moreover, the influence of the number of inverters on the power consumption and measurement error is investigated by [9] in a conventional design. Unfortunately, some other important factors are not considered and measured by [9], such as area and thermal overheads of the sensor network. Also, in order to study the influence of the number of delay elements on the sensor performance, in [13], a relative performance metric is proposed based on the trade-off between sensor noise and resolution. This metric is useful for evaluating the relative performance of only one sensor, not a sensor network.

Although a variety designs of the RO-based temperature sensors have been proposed in the literature for monitoring the thermal behavior of FPGAs, to the best of our knowledge there is not a unique metric for evaluating the efficiency of different designs of sensors in a network. This paper surveys different kinds of relative performance evaluation metrics that are used to evaluate the efficiency of different sensor designs in a network according to the proposed QF metric. In the next section, we first present the proposed RO-based temperature sensor because it presents a new RO design, and then, four basic evaluation metrics are introduced in Section 5, which are useful for comparing and evaluating the relative performance of different sensor networks.

# 4. Proposed RO-based temperature sensor

In this section, a fully digital temperature sensor without any analog components is introduced which is composed of a novel RO design and an RNS ring counter to make it ultra-sensitive and more compact simultaneously. Regardless of the sensor type, there are a number of common characteristics that need to be concerned for implementation, such as temperature measurement range, power consumption, reliability, accuracy, linearity, and sensitivity [27]. Linearity defines how well the sensor's output consistently changes over a temperature range. Unfortunately, the RO-based temperature sensor's behavior is less linear over time due to the technology scaling. Franco et al. [12] investigate the RO behavior at low voltages on a Virtex-5 FPGA and report increased non-linearity of frequency-temperature response at 1.0 V low-voltage core FPGA. Also, in [7], the authors report that a 2nd order polynomial model provides 2.2 °C less error in estimated temperature compared to the traditional linear model.

More sensitivity allows sensors to sense finer changes in temperature, and hence, more accuracy. With voltage down scaling and more advanced process technologies, the dependency of the RO frequency changes on temperature, called sensitivity, decreases and has become a bottleneck in modern FPGAs [7]. This is due to lower power supply voltage  $V_{DD}$  and hence, the term of  $(V_{DD}-V_T)$  in the denominator of Eq. (3) is no longer dominated. In other words, considering the thermal effect of the threshold voltage and the fact that, unlike  $\mu$  the term of  $-V_T$  increases as the temperature increases, thermal effects of  $V_T$  and  $\mu$ almost balance each other. Hence, the temperature dependence of the propagation delay of an inverter is now very weak. Note that, the sensitivity of a sensor depends on only the RO design, not the counter, since the RO is a temperature-sensitive circuit and the counter only captures its temperature-dependent oscillations. Indeed, over the range of operating temperature, the bigger the oscillation frequency changes of an RO is, the more the sensor sensitivity, and so the better the accuracy [7]. In other words, the RO frequency changes should be increased to overcome the destructive effect of advanced technologies, i.e. sensitivity degradation. One approach relies on reducing the number of RO stages [7]. But, traditionally the RO designs shorter than 3 stages (i.e. 1-stage) oscillate with extreme frequencies, consume watt-level power, generate much great heat and thus cannot be used as temperature sensors. Moreover, at these frequencies, the oscillation pulses may be not reliable and the counter may operate unreliably in such high frequencies. As an alternative, we introduce an efficient method to increase the oscillation frequency changes as desired and also maintain the operation of system reliable in order to improve the sensor's sensitivity.

# 4.1. Ring-oscillator design

If there is a way to reduce the total delay  $t_D$  of the RO chain (e.g. reduce the number of RO stage), whereas the oscillation frequency is not such high as a 1-stage RO, and the noise level maintains almost constant, the RO frequency changes could be increased as desired and consequently, the sensitivity of the sensor would be increased. The work in [7] presents an RO in which an even number of inverters (i.e. 2 inverters) are connected to each other and the control logical AND gate in conventional design (see Fig. 1) is replaced by a logical NAND gate. In this case, one LUT is eliminated from the RO circuit in order to decrease the total delay of the chain as well as the resources occupied by the RO, likewise the RO output toggles between "0" and "1". In order to further improve the sensitivity as well as reduce the resource utilization of the RO, we propose a novel technique to implement the RO circuit. Our specific technique is to implement logic elements of the RO circuit using a primitive cell called *CFGLUT5* [26]. This component is a runtime element that can be reconfigured dynamically during the operation phase and occupies only one LUT within a SLICEM. One of the main features of this primitive is that, unlike conventional LUTs (i.e. LUT1, LUT2, etc.) which can be configured to implement only one logical function per each LUT, the CFGLUT5 can be optionally configured to create two individual 4-input functions in a single LUT, which consequently decreases the number of elements in the RO chain.

There are various approaches to design a delay line, which have different propagation delays, result in different oscillation frequencies of the RO. Table 1 compares the propagation delay of different primitive cells, i.e. LUT1, CFGLUT5, and XORCY, with various configurations in a Virtex-5 FPGA, as depicted in Fig. 2. These values are obtained from Xilinx tool (i.e. FPGA editor) after post-place and route (PAR) process. The tool contains features which automatically generates and includes accurate delays for all circuit nodes. For a fair comparison, these delays are calculated after the circuits have been mapped to a single fixed location, using physical constraint statements in the target chip such as Xilinx's location (LOC) and BEL constraints [28]. Therefore, these values are quite accurate. By comparing designs (a) and (b) it is clear that the propagation delay of LUT1 and CFGLUT5 primitive cells, which both are configured as an inverter, is quite equal (i.e.  $t_{pd}=0.086 \text{ ns}$ ), as seen in Table 1. In contrast, the propagation

| Table 1       |                 |                 |                 |             |               |
|---------------|-----------------|-----------------|-----------------|-------------|---------------|
| Comparison of | propagation del | ay of different | primitive cells | on a Xilinx | Virtex-5 FPGA |

| Design number         | Design structure                                                            | Propagation delay (ns)                    |
|-----------------------|-----------------------------------------------------------------------------|-------------------------------------------|
| a<br>b<br>c<br>d<br>e | LUT1<br>CFGLUT5 (1 function)<br>XORCY<br>LUT1+LUT1<br>CFGLUT5 (2 functions) | 0.086<br>0.086<br>0.404<br>0.840<br>0.677 |
|                       |                                                                             |                                           |



Fig. 2. Various approaches to design a delay line: (a) LUT1 configured as an inverter, (b) CFGLUT5 configured as an inverter, (c) XORCY, (d) two series LUT1s configured as inverters, and (e) CFGLUT5 configured as two series inverters.

delay of the XORCY, which is a part of the carry chain circuit, is much higher (i.e. 4.7 times) than the LUT1 due to the access constraint to the carry chain.

In order to construct a delay line, the designers usually utilize series of inverters as shown in Fig. 2d. The total delay of two series of LUT1s configured as inverters (i.e.  $t_{Dd}$ =0.840 ns) is 9.77 times higher than a single LUT1 for the chip under test (see Table 1). As compared to the design (d), the total delay of the CFGLUT5 element, which is configured to implement two functions (i.e. inverters) in a single LUT (Fig. 2e), is about 20% less. Moreover, in this structure (i.e. design (e)) one LUT is saved. Based on the aforementioned results, we propose a useful technique to reduce the total delay of the RO chain  $t_D$ . According to Eq. (1), reducing the  $t_D$  is a key contributor to increase the operating frequency of the RO. Therefore, choosing an appropriate implementation style can really help.

In this work, two CFGLUT5 elements are utilized in order to design a 3-stage RO, which is implemented in 2 LUTs instead of 4 LUTs in a conventional design, as shown in Fig. 3. The first primitive cell (i.e. CFGLUT5 #1) is configured as 3-input LUT with two different functions, i.e. control logical AND gate and inverter (NOT gate). Another primitive cell (i.e. CFGLUT5 #2) is configured as 2-input LUT to create two individual inverters in a single LUT. An INIT attribute should be specified on each CFGLUT5 primitive cell to indicate its logical function(s). The INIT value of 0F0F8888 (hexadecimal) is set for the first element (i.e. CFGLUT5 #1), which represents one inverter along with 2-input AND gate (AND2) in order to control the RO's oscillations. Also, the INIT value of X"33335555 configures the CFGLUT5 #2 as two individual inverters, which we use the O5 output in combination with the O6 output to create two series inverters. We also instantiate an open latch (LD) along with each element, to further improve the sensitivity of the RO as depicted in Fig. 4. A latch in the open state, as the name suggests, acts as an additional transistor-based wire. Fortunately, unlike connections between LUTs, the latch does not need significant routing resources to connect to the adjacent LUT's output, since always one latch is available near each LUT in FPGA slices. So, this method does not significantly reduce the RO's oscillation frequency. Because the effect of



Fig. 3. Modified 3-stage RO design.

temperature on transistor delay is more than on routing delay [7,13], instantiating an open latch along with each CFGLUT5 element improves the sensor's sensitivity.

Note that, by default, the synthesis tool omits the inverting elements in the RO chain during logic compilation or optimization process to get better results, i.e. utilization reduction and speed enhancement. To solve this problem, we use the KEEP attribute in our hardware description language (HDL) code (KEEP="TRUE"), which is a synthesis constraint to prevent the signal-optimizing throughout the implementation process.

# 4.2. Counter design

There are two approaches for implementing a sensor network in order to measure local temperatures and monitor the thermal distribution of the chip. One is to implement a single centralized counter shared by multiple ROs, called serial reading [8]. Two main drawbacks of this method are the loss of spatial thermal data and high probability of significant error in estimated temperature due to reading the sensors' data one by one. Note that, especially in low-voltage FPGA cores, the oscillation frequency of an RO is not only a function of the local on-chip temperature but also the power supply voltage. While some parameters like voltage or current can change dramatically in the several milliseconds, serial reading can cause significant error in estimated temperature [7,29]. Another approach, the parallel reading, is based on reading all sensors' data simultaneously. In this method, each RO has an associated counter that is connected to the RO's output, allows all sensors to be enabled at a time and gives designers the ability to simultaneously provide a snapshot across the chip. However, in a sensor network in which a considerable number of sensors is instantiated, utilizing an associated binary counter along with each RO incurs large area and power overheads in the latter method, forcing a trade-off between area/power overhead and reading method. Researchers widely use binary counters in an RO-based temperature sensor because not only a binary counter is easy to realize, but also there is no need to decode its output, which make it easy to use. On the other hand, one of the trade-offs in designing an RO-based temperature sensor is between RO length and binary counter width. Generally, the shorter the RO stage is, the higher the oscillation frequencies are, the wider the binary counter. Note that, to avoid the counter overflowing, the RO with higher frequency requires the more counter's width.

In addition an ultra-sensitive RO, we use an RNS ring counter in the proposed sensor in order to design a compact and low-overhead sensor. A compact sensor gives the opportunity to embed more sensors on the network, detect more hotspots, and consequently, improve the accuracy of DTM techniques. An RNS ring counter, which can be implemented very compactly using Xilinx's shift register LUT (SRL) primitive cells, is a circular shift register which is initiated such that only one of its registers is the state one while others are in their zero states. To clarify the resources occupied by each counter design, it should be noted that compared to a 10-bit binary counter, which can



Fig. 4. The proposed 3-stage RO design (including open latches) implemented in 2 LUTs.



Fig. 5. The 3-moduli set RNS ring counter configuration in the proposed temperature sensor.

count up to  $2^{10}$ =1024, an RNS ring counter with almost the same maximum count (i.e.  $32 \times 33$ =1056) occupies 5 times less LUTs and 10 times fewer flip-flops (FFs). As noted in [26], a Xilinx's SRL32 with variable length of 1- to 32-bit can be implemented within only a single LUT. The 3-moduli set RNS ring counter, which is clocked by the proposed RO, is used in our sensor design as shown in Fig. 5. The counting period *M*, also known as *count<sub>max</sub>*, is obtained by multiplying length of all SRLs as

$$M = \prod_{i=1}^{n} m_i \tag{7}$$

where n is the number of moduli and  $m_i$  is a pairwise relatively prime modulus.

The only implementation restriction is that all SRLs' length must be all pairwise relatively prime. To meet this restriction and also to avoid the counter overflowing, we use 3 LUTs configured as SRLs, each in a ring (see Fig. 5). The first LUT acts as a 31-bit shift register, the second as a 32-bit shift register, and the third LUT accompanied with one FF as a 33-bit (32-bit plus 1-bit) shift register. With three rings the maximum count value reaches count<sub>max</sub>=31×32×33=32736 which is enough to work the counter reliably with respect to the frequency of the proposed RO. The 5-bit address bus A is tied to a fixed value of 30, 31, and 31 for SRLC32E #1, #2, and #3 to signify a fixed 31, 32, and 32-bit shift length, respectively. Note that, in an SRLC32E primitive cell, the shift register length is the address input plus one. The INIT attribute of the SRL32 consisting of a 32-bit hexadecimal value can be specified to indicate the initial shift pattern of the shift register. Typically, a pattern consisting of a single hot-bit is circulated so the state repeats every nclock cycles if n-bit SRL is used. The INIT [31] and INIT [30], the last value shifted out, is set to "1" for SRLC32E #1 and #2, respectively. For the third SRL the INIT value is not specified, it defaults to a value of zeros. But, the INIT value of its associated FF is tied to "1" in order to make the counter data easy to decode.

The output of an RNS ring counter is not trivial like a binary counter and needs to be decoded by the Chinese remainder theorem (CRT) by using Eq. (8)[30]:

$$Count = \left[\sum_{i=1}^{n} r_i \left(\frac{M}{m_i}\right) w_i\right] \mod M$$
(8)

where *n* is the number of moduli,  $r_i$  is a residue, *M* is the counting period,  $m_i$  is a pairwise relatively prime modulus, and  $w_i$  is a weight found with the Euclidean algorithm. Due to the complexity of the method for finding the residues (executing CRT algorithm for decoding all the RNS ring counters in a sensor network) there is a performance overhead which is negligible.

Fig. 6 illustrates the design of the proposed RO-based temperature sensor. As depicted, the sensor requires a 20-bit data bus to connect to the 32-bit MicroBlaze soft microprocessor core via processor local bus (PLB), allowing the MicroBlaze to control the activity of the sensor and also access the data (i.e. 3-bit output data and 15-bit address bus) to find the residue  $r_i$ . Note that, the enable signal of the RO (i.e. EN<sub>1</sub>) and also the enable signal of the RNS ring counter (i.e. EN<sub>2</sub>) are asynchronous and when the signal enables/disables the RO, it sometimes causes a glitch. Therefore, occasional glitches may corrupt the counter value. To address this issue, a simple and effective technique is used in order to maintain the operation of the system safer and more reliable. After the RO is enabled by the MicroBlaze (i.e.  $EN_1=1$ ), we first wait for 2<sup>10</sup> clock cycles so that the RO reaches to a steady state of having a stable and constant frequency, and then, activate the enable signal of the counter (i.e.  $EN_2=1$ ) for a fixed time interval. Then, first the counter is disabled and after that, the RO is disabled in order to avoid counting occasional glitches. After taking a snapshot of the chip and disabling the sensor (i.e.  $EN_1 = EN_2 = 0$ ), the 32-bit MicroBlaze sweeps the 15-bit address bus of the sensor (5-bit address bus A for each SRL) and simultaneously reads 3 bits of output data of the sensor (one bit for each SRL output) via the PLB to find the residue  $r_i$  of the corresponding SRL. Note that, the position of each "1" bit represents the residue. Hence, when the 3-bit output data is equal to "111", the values of three 5-bit address buses of the sensor (i.e. the residue) are read by the MicroBlaze. Finally, the CRT algorithm is executed to



Fig. 6. The proposed RO-based temperature sensor design.

decode the counter value. Then, immediately, the aforementioned procedure is performed four times and finally we average the five counter values in order to minimize the glitch effect for the final value of the counter. The comparison results for resource utilization, temperature measurement range, and sensitivity of the proposed sensor with other existing sensor designs are reported in Section 7.

# 5. Basic evaluation metrics for RO-based temperature sensors

The designers of the sensor network face various choices for their design including number of sensors, RO designs, counter designs, RO length, and counter width, where each of the designs in this large space has different characteristics. Obviously, while exploring in such a large design space, there are some design trade-offs that should be considered because different design's options affect the efficiency of the sensor network. A large variety of designs have been proposed for RObased temperature sensors in the literature. The RO designs include the following basic elements: 1) inverter (INV), 2) INV & latch (LD), 3) XORCY, 4) XORCY & LD, 5) CFGLUT5, and 6) CFGLUT5 & LD (proposed RO), each one can be implemented with variant length. Also, the counters can be implemented as 1) the binary counter or 2) the RNS ring counter, each of which can be implemented with variant width. When designing a network of temperature sensors, there are some important parameters, such as overheads and accuracy, which need to be considered at the time of design. In contrast to related work, we would like to survey different kinds of relative performance evaluation metrics that have been used to evaluate the efficiency of different designs of the RO-based temperature sensor in a network. In this section, we present a set of basic evaluation metrics, in terms of area, thermal, and power overheads, and thermal map error and the method used to measure them. The metrics provide useful information for decision making. Based on these metrics, the decision maker would be able to make a proper decision.

# 5.1. Area overhead

When designing a temperature sensor network, the resource

utilization is an important metric that should be considered because it may affect the power consumption as well as the heat generated by sensors. The area overhead  $A_{OH}$  of a sensor network is calculated in terms of the total number of utilized LUTs and FFs/latches of the FPGA using Eq. (9):

$$A_{OH} = A_{TotalNet} \tag{9}$$

where  $A_{TotalNet}$  is the total utilized resources by the sensor network that are reported by the Xilinx tool (i.e. PlanAhead) after the design was implemented by the PAR process. In order to make the details more clear, we formulate the area overhead of the RO-based sensor network, which is expressed as

$$A_{OH} = A_{TotalNet} = A_{SensNet} + A_{PLB}$$
(10)

where the  $A_{PLB}$  is the resource utilization of the PLB and  $A_{SensNet}$  is the resources occupied by sensors in the network, which can be expressed as

$$A_{SensNet} = N_S \times A_{Sens} \tag{11}$$

where  $N_S$  is the number of sensors in the network and  $A_{Sens}$  is the resource utilization of a single sensor, which can be expressed as

$$A_{Sens} = A_{RO} + A_{Counter} \tag{12}$$

where  $A_{RO}$  and  $A_{Counter}$  are the resources occupied by the RO and the counter, respectively. By substituting (11) and (12) in (10), we can calculate the area overhead of a sensor network by

$$A_{OH} = N_S \times (A_{RO} + A_{Counter}) + A_{PLB}$$
<sup>(13)</sup>

According to (13) and the fact that for the same number of sensors in a network, the resources occupied by the PLB (i.e.  $A_{PLB}$ ) is a constant value, for more compact sensors we expect less area overhead, especially in a network with a large number of temperature sensors.

#### 5.2. Thermal overhead

The thermal overhead of an RO-based sensor network needs to be taken into account since naturally sensor itself generates heat when active due to oscillations of the RO, called self-heating. Therefore, the ROs should be gated to reduce destructive self-heating effects of the sensor network. Although clock gating reduces switching power consumption, and hence, rising temperature, in a runtime application that continuously taken a snapshot of the design, it still leads to increase the chip temperature. In other words, the thermal overhead of a sensor network must be low. While most researchers like [6-9,12] acknowledge this phenomenon, to the best of our knowledge there is no systematic study on measuring the self-heating effect of sensor networks. In fact, in FPGA-based designs, each application has its own thermal behavior and increases the die temperature by a certain amount. To measure the thermal overhead of a sensor network, a fixed heat generator circuit (heater) is designed and developed on the FPGA [24] in which the generated heat is controllable. We tune the temperature generated by the heat generator circuit, according to the results obtained from the study of the temperature of several benchmark implemented on an FPGA [31]. The thermal overhead of a network of soft-sensors is calculated based on the following four steps:

- (i) All sensors in the network are deactivated and the heat generator circuit, which acts as the benchmark, increases the die temperature from a minimum operating temperature (*T<sub>min</sub>*) to a maximum (*T<sub>max</sub>* or *T<sub>withoutNet</sub>*). Assume this operation takes *t<sub>m</sub>* seconds.
- (ii) The FPGA is cooled down to  $T_{min}$ .
- (iii) The sensor network is activated immediately and the heater starts its activation for t<sub>m</sub> seconds. Assume the final temperature is T<sub>withNet</sub>.
- (iv) The thermal overhead  $T_{OH}$  is defined and formulated as the differential on-chip temperature due to using a sensor network and is calculated using Eq. (14):

$$T_{OH} = T_{withNet} - T_{withoutNet}$$
(14)

where  $T_{withNet}$  and  $T_{withoutNet}$  represent the on-chip temperature with and without embedding a network of sensors, respectively.

# 5.3. Power overhead

Power overhead of a sensor network is quite an important and useful metric for decision-making at the time of design. It should be low as much as possible because in addition to resources utilized by the actual design, sensors also consume power, and hence, add extra power consumption. In an RO-based temperature sensor network, the dynamic power dissipation  $P_{dynamic}$  relies on the amount of ROs' oscillation frequencies and resources occupied by counters as Eq. (15):

$$P_{dynamic} = P_{dyn,RO} + P_{dyn,Counter}$$
(15)

where  $P_{dyn_{RO}}$  and  $P_{dyn_{Counter}}$  are the dynamic power consumed by ROs and counters, respectively. The steps for calculating the power overhead is very similar to thermal overhead. The power overhead  $P_{OH}$ of a sensor network is defined as the differential average power consumption due to utilizing an array of temperature sensors and is calculated using Eq. (16):

$$P_{OH} = P_{withNet} - P_{withoutNet}$$
(16)

where  $P_{withNet}$  and  $P_{withoutNet}$  are the average power consumed over the time interval  $t_m$  with and without embedding a sensor network, correspondingly, and both are calculated using Eq. (17):

$$P_{with/withoutNet} = \frac{1}{t_m} \int_0^{t_m} P(t). \ dt \tag{17}$$

where P(t) is the instantaneous power consumption.

#### 5.4. Thermal map error

A very important characteristic of a temperature sensor is its accuracy. Temperature measurement error is normally a useful metric

that should be considered at the time of design. Thermal map error of a sensor network directly affects the efficiency of the system as well as the DTM techniques. The inaccuracy of a temperature sensor may cause problems such as performance degradation, due to early activation of DTM, or reliability degradation due to its late activation [32]. In a network of temperature sensors, a more sensitivity results in less error, more accurate measurement, hence, more accurate thermal profiling. Generally, the sensitivity of a sensor is defined as the change in the output of the sensor per unit change in the parameter being measured. In an RO-based temperature sensor, the sensitivity is defined as the amount of RO's frequency reduction per 1 °C increase as Eq. (18):

$$Sensitivity = \frac{f_{\max} - f_{\min}}{T_{\max} - T_{\min}}$$
(18)

where  $f_{max}$  and  $f_{min}$  are the oscillation frequency of the RO in the time interval at minimum  $T_{min}$  and maximum  $T_{max}$  temperatures, respectively. The RO frequency is measured by counting RO's pulses during the sample period as Eq. (19):

$$f = \frac{count}{t_s}$$
(19)

where the *count* is the RO's pulses obtained by reading the counter's output,  $t_s$  is the sample period in which the sensor is activated, and f is the oscillation frequency of the RO. The sensitivity of a sensor illustrates how much the RO frequency or counter value changes for a given temperature range. More sensitivity allows sensors to sense small changes in temperature.

In order to measure and evaluate the accuracy of different sensor networks, they need to be compared to a fixed and more accurate network, called reference network. As mentioned in the literature review, the accuracy of the temperature measured by the RO-based thermal sensor is confirmed and validated by comparing the temperatures obtained from RO-based sensors to values obtained from wellknown approaches, i.e. simulators like HotSpot [25] and infrared camera [9]. Therefore, an array of digital temperature sensors based on the RO with the highest sensitivity among other designs is embedded within the reference network to provide a reference model of the thermal map of the FPGA. We assume that the reference network has m times more sensors than the examined networks in order to make it more accurate. In other words, one out of each n sensors is excluded from the reference network and the temperature of excluded sensors is estimated as the average values of immediate neighbor grids sensors. Then, these values are compared with the actual values obtained from corresponding temperature sensors in the reference model. To clarify this evaluation method, consider an example in which a grid-based array of 4×4 temperature sensors is placed in the reference network. As shown in Fig. 7a, a set of sensors S<sub>ref</sub>={S<sub>r1</sub>,  $S_{r2}$ , ...,  $S_{r16}$ } is allocated at the center of each grid (reference model). The top view of an examined sensor network is shown in Fig. 7b. As seen, one out of each two sensors is excluded from the reference

| 1    | 2  | 3    | 4    | 1          | $\mathbf{x}$ | 3          | Ŷ          |
|------|----|------|------|------------|--------------|------------|------------|
| 5    | 6  | 7    | 8    | $\bigcirc$ | 6            | Z          | 8          |
| 9    | 10 | (11) | (12) | 9          | $\bigcirc$   | (11)       | $\bigcirc$ |
| (13) | 14 | (15) | (16) | $\bigcirc$ | 14)          | $\bigcirc$ | 16         |
|      | (a |      |      |            | (            | (b)        |            |

**Fig. 7.** The floorplan of the sensors' placement: (a) reference network and (b) examined sensor network.

network. In this case, in the examined network (Fig. 7b), the temperatures obtained from the set of sensors  $S_{net}=\{S_{n1}, S_{n3}, S_{n6}, S_{n8}, S_{n9}, S_{n11}, S_{n14}, S_{n16}\}$  is compared with the temperatures obtained from corresponding sensors  $S_{ref}=\{S_{r1}, S_{r3}, S_{r6}, S_{r8}, S_{r9}, S_{r11}, S_{r14}, S_{r16}\}$  in the reference network. Also, the temperature of grids that do not include any sensors is estimated as the average temperature of the immediate neighbor grids sensors. For instance, the temperature value of locations X and Y is calculated using  $[(T_{s1}+T_{s3}+T_{s6})/3]$  and  $[(T_{s3}+T_{s8})/2]$ , respectively.

We define the thermal map error  $Tmap_{error}$  of a sensor network as how close the measured/estimated local temperature obtained from the examined network is to the temperature measured by the reference network as Eq. (20):

$$Tmap_{Error} = \sqrt{\frac{1}{n} \sum_{i=1}^{n} (T_{ref,i} - T_{sens,i})^2}$$
 (20)

where *n* is the number of network grids,  $T_{ref, i}$  is the reference network's temperature in grid *i*, and  $T_{sens, i}$  is the temperature of the sensor in the same grid of the examined network. According to (20), not only the absolute extrema, but each relative extrema temperature of the chip is also considered to evaluate the thermal map error of each network. Indeed, due to positive feedback between leakage current and junction temperature, all relative extrema points should also be considered in the DTM of chips because they may lead to being hotspots in the several milliseconds. The comparison results of different sensor designs in terms of the basic evaluation metrics are reported in Section 7.

# 6. Quality factor metric

The value of each basic evaluation metric strongly depends on the sensor design in the network. Besides, each design can be implemented with variant RO length, counter width, and the number of sensors, which result in a large design space. While several RO-based temperature sensors' designs with their own specific features have been introduced [5-15], it is very difficult to judge them considering multiple contradictory relative performance metrics. To address this issue, we propose the QF metric, a novel metric for evaluating the efficiency of RO-based temperature sensors' designs, considering different aspects. Using the QF metric, it is possible to characterize each sensor network and investigate the sensor designs' influences. The proposed QF is a figure-of-merit (FOM) based on the sensors' overheads and accuracy to determine the quality of a sensor network. This provides the information for decision making to choose an optimal design of the sensor based on the basic metrics obtained.

Ideally, a network of temperature sensors would have a set of four following specifications simultaneously: low area, thermal, and power overheads and also low thermal map error. In other words, for a sensors network, better overhead metrics include minimizing the area overhead, minimizing the thermal overhead, and minimizing the power overhead. Also, the better the accuracy is, the less the thermal map error. But, generally, there are design trade-offs between these parameters due to different designs of RO-based temperature sensors.

The trade-offs between basic evaluation metrics can be observed with respect to three main aspects as follows:

- 1) The RO stage: the longer the RO stage is,
  - (i) The more the occupied resources, the more the  $A_{OH}$ .
  - (ii) The lower the oscillation frequency (see Eq. (1)), the less the T<sub>OH</sub>.
  - (iii) Accordingly the less the  $P_{OH}$ .
  - (iv) The lower the sensitivity, the higher the  $Tmap_{Error}$ . Therefore, by increasing the RO stage the  $T_{OH}$  and the  $P_{OH}$  get better, but unfortunately, the  $A_{OH}$  as well as the  $Tmap_{Error}$  get worse, and vice versa.

- 2) The RO design: due to different configurations of the RO, in which result in variant oscillation frequency, the mentioned trade-offs between basic evaluation metrics can be similarly observed.
- 3) The number of sensors: more accurate sensor networks (i.e. less  $Tmap_{Error}$ ) usually have more embedded sensors to efficiency detect more hotspots. Hence, they occupy more area, consume more power and generate more heat while less accurate sensor networks (i.e. more  $Tmap_{Error}$ ) may have less  $A_{OH}$ ,  $T_{OH}$ , and  $P_{OH}$ .

At this point, an interesting question is: what is the best design of the sensor network? To address this design challenge, we need a metric that can provide a trade-off between minimizing the area overhead and the thermal map error and also minimizing the thermal and power overheads. Therefore, we propose the QF metric to demonstrate how multiple criteria can be combined and observed as a single metric to evaluate trade-offs between basic evaluation metrics for temperature sensor networks.

A second question is: what is the proper number of temperature sensors in a network? In principle, it should be high enough to sense the finest spatial granularity of each local temperature of the design implemented on a target FPGA, causing large overheads. Fortunately, many researchers such as [11,33,34] have proposed several effective algorithms to solve the thermal sensor allocation and placement problems and determine the optimal number and location of sensors. Hence, in this work, we only explore different designs of the sensor in a network, not a number of them, to find the best efficient design.

To quantify how efficient a temperature sensor network is, we use the product of four basic metrics (i.e.  $A_{OH}$ ,  $T_{OH}$ ,  $P_{OH}$ , and  $Tmap_{Error}$ ) with equal weight, according to prove the authenticity and integrity of the well-known power-delay product (PDP) and also EDP metrics in electronics designs. But, since the higher quality is the better, we define the QF as reverse the product of these four basic metrics as Eq. (21):

$$QF: = \frac{1}{A_{OH} \times T_{OH} \times P_{OH} \times Tmap_{Error}}$$
(21)

An ideal design of the sensor network should have an absolute minimum value of AOH, TOH, POH as well as TmapError simultaneously among other designs. But, there are always trade-offs between these criteria. The QF metric captures these design trade-offs and devotes a unique numerical score to each design. Note that, the proposed QF metric expressed as Eq. (21) is the basic version of FOM (i.e. OF metric) of an RO-based temperature sensor network. As stated, the QF metric is proposed to help designers to evaluate different designs when there are several objectives. In other words, OF metric helps to integrate several metrics and reach to one single value. There are several similar well-known metrics such as PDP and EDP which have the same goal. However, as various versions of the EDP metric (e.g. ED<sup>2</sup>P) have been developed according to specific applications, various versions of the QF metric can be introduced, which can focus on the specific metric(s). The fundamental object of proposing the QF metric is to establish an FOM of the RO-based temperature sensor network that can combine various metrics (including metrics mentioned in Section 5) to get one single score for evaluating the trade-off between multiple contradictory metrics and objectives.

#### 7. Experimental evaluation results and analysis

In this section, we first introduce the experimental setup used in this work for evaluating the proposed temperature sensor and also various sensor networks. Next, the experimental evaluation results of each part are presented and discussed.

The Xilinx Virtex-5 LX50T specifications.

| Logic resources  | Slices<br>Logic cells                                  | 7200<br>46.080       |
|------------------|--------------------------------------------------------|----------------------|
|                  | CLB flip-flops                                         | 28,800               |
| Memory resources | Total block RAM (Kb)                                   | 2160                 |
|                  | DIOCK RAIM (30 KD each)                                | 60                   |
| Clock resources  | Digital clock manager (DCM)<br>Phase-locked loop (PLL) | 12<br>6              |
| Speed grades     | Commercial<br>Industrial                               | -1, -2, -3<br>-1, -2 |
|                  |                                                        |                      |

#### 7.1. Experimental setup

This subsection explains the systems setup for evaluating the proposed RO-based temperature sensor and sensor networks.

7.1.1. System setup for evaluating the proposed RO-based temperature sensor

For our experiments, the Genesys<sup>TM</sup> development board that contains the Xilinx Virtex-5 LX50 T FPGA is used. The important specifications of the target FPGA are listed in Table 2. The system setup of all our experiments includes the following six main components as shown in Fig. 8:

- (i) The MicroBlaze soft microprocessor core which controls the peripherals using the PLB and also runs the software application written in C language that fits into 32 KB of block random access memory in order to read sensors' data and decode them if necessary.
- (ii) The proposed temperature sensor or examined sensor networks.
- (iii) The heat generator circuit (heater) which is utilized to increase the die temperature in the desired range.
- (iv) The timer IP core which manages to activate the proposed sensor/ sensor network for a fixed 40 μs time window.
- (v) The system monitor IP core composed of a 10-bit 200-kilosample/ s analog-to-digital converter (ADC), a register file interface and two on-chip sensors (i.e. voltage and temperature sensors) situated in the center of the Virtex-5 FPGA, which is used to reads the values of the built-in sensors in the calibration procedure.
- (vi) The universal asynchronous receiver/transmitter (UART) IP core which logs the proposed sensor/sensor networks' data and system monitor readings through the RS232 interface to a host PC.

The MicroBlaze soft microprocessor core and all of its peripherals operate at 100 MHz. The proposed temperature sensor, several designs of the RO-based temperature sensor, and the heat generator circuit are synthesized using HDL and connected to MicroBlaze via PLB, allowing



Fig. 8. Block diagram of our experimental setup implemented on the Xilinx Virtex-5 LX50T FPGA.



Fig. 9. Schematic of the one-level LUT-based oscillator heater.

the MicroBlaze to control them and access the measured data. The heat generator circuit composed of one-level LUT-based oscillators, which enables the maximum toggling frequency [24]. Fig. 9 shows the LUT-based oscillator with a single LUT. We instantiate 10,000 one-level LUT oscillators to heat up the chip and calibrate the proposed sensor for a wide range of temperature.

# 7.1.2. System setup for evaluating temperature sensor networks

In order to evaluate the relative performance as well as determine the OF of different sensor networks, the system is configured similarly to the previous system setup (except for two components), which is described in Section 7.1.1 (Fig. 8). One change is that the sensor network, which consists of 35 sensors, is replaced by the proposed sensor. The second change is the configuration of the heater circuit. Since a set of well-known benchmarks operates in a range of temperatures from  $T_{min}$ =30 °C to  $T_{max}$ =55 °C [31], therefore, we study the influences of sensor designs on the network performance (i.e. thermal overhead, power overhead, etc.) at this temperature range. In order to do so, the heater is utilized using the LUT-FF-based pipeline [19] which consists of 10 pipelines with 1000 stages where each stage contains an LUT and an FF clocked at 400 MHz (Fig. 10). In all our experiments, a regular grid-based sensor network is constructed and partitioned containing 56×90 slices into a regular grid of 8×9 tiles, where between each pair of tiles there is a temperature sensor placed at the center, as described in Section 5.5. From one experiment to another, only the sensor design changes; however, to have a fair comparison, the locations of sensors and the heat generator circuit are exactly the same for all experiments. Taking advantage of the Genesys<sup>™</sup> Virtex-5 FPGA development board, we can measure the power consumption of a network of RO-based temperature sensors by using the Power Meter, which is a software provided by the Digilent Adept tool. It provides highly accurate real-time voltage, current, and power readings from onboard TI power supply monitors and allows for data transfer with Xilinx FPGAs.

As mentioned, the temperature sensor network can be implemented with various designs of the RO and the counter, each one can be configured with variant length/width which creates a large design space. In fact, due to intra-device and intra-die variations, for a single particular design, the amount of the performance evaluation metrics may be vary from one chip to another, resulting in unfair comparisons of various designs. To solve this problem, we individually implement



Fig. 10. Schematic of the LUT-FF-based pipeline heat generator.

each of already proposed designs and also other possible designs of the RO-based temperature sensor on a single Virtex-5 LX50 T FPGA. For more fairness in comparison, we first assume that all sensors are composed of a fixed length-of-three RO, 15-bit binary counter or 3-moduli set RNS ring counter, and the networks are evaluated for only different RO and counter configurations, in terms of proposed performance evaluation metrics. Then, the RO length (i.e. from 3- to 31-stage) and the counter width are explored in order to evaluate the influences of variant RO length/counter width on the efficiency of the temperature sensor network.

# 7.2. Experimental evaluation results

The experimental results of this work are divided into four main categories as follows. In Section 7.2.1, we test and evaluate the proposed temperature sensor. In Section 7.2.2, the second experiment series investigates twelve different configurations of the sensor in the network and compares them with each other, in terms of four basic evaluation metrics, which are defined in Section 5. In Section 7.2.3, the influences of the RO length and the counter width are investigated on the relative-performance evaluation metrics of the sensor network. Finally, in Section 7.2.4, according to the QF metric, defined in Section 6, the best structure of the sensor network is introduced.

# 7.2.1. Results of the proposed temperature sensor

In this experiment, we plan to compare the proposed temperature sensor with previously published designs in terms of resource utilization, temperature measurement range, and sensitivity.

7.2.1.1. Resource utilization. The proposed RO-based temperature sensor, described in Section 4, is totally composed of only 5 LUTs, two LUTs for RO and three for the counter. Table 3 demonstrates the comparison of our sensor's LUT utilization with other sensors' designs. As can be seen, the proposed sensor is smaller than previous designs. It occupies 37.5% less resources compared to the most compact sensor that has been proposed (i.e. [7]). For the problem of the thermal sensor allocation and placement, in which the important goals are to precisely and optimally find the best location as well as the number of sensors in order to monitor and cover a set of hotspots, the resource utilization of the sensor is quite important. Especially, when a high-complexity application should be mapped on a limited-resources target FPGA. After finding the desired location, if there are sufficient available resources in this location at which the sensor is to be inserted, there will be no problem. But, if the resources are not enough, the application should be remapped in order to free this location. However, in a reconfigurable fabric, remapping of an application may cause changes of hotspots' locations. So, an ultra-compact sensor gives the opportunity to insert the sensor in such location [11]. Moreover, it leads to less overhead of a sensor network.

7.2.1.2. Temperature measurement range. Temperature ranges vary for different sensor types. Every RO-based temperature sensor is

 Table 3

 Comparison of the RO-based temperature sensor's LUT counts.

| Design   | Number of sensor LUTs | RO stages | Target FPGA |
|----------|-----------------------|-----------|-------------|
| [22]     | 140                   | N/A       | ACEX 1K     |
| [35]     | 100                   | 47        | Virtex-4    |
| [20]     | 87                    | 31        | Virtex-5    |
| [29]     | 40                    | N/A       | Virtex-5    |
| [10]     | 34                    | 7         | Virtex-1    |
| [8]      | 24                    | 7         | Spartan-6   |
| [7]      | 8                     | 3         | Virtex-5    |
| Proposed | 5                     | 3         | Virtex-5    |

designed to operate over a specified range of temperature. This range is usually fixed, and in FPGA-based systems strongly depends on the calibration process, and if it is exceeded, it may be eventuated in significant error in temperature estimated by calibration equation. In order to use a soft-sensor, initially, a mapping function for RO oscillation frequency f to junction temperature T is highly required to translate oscillation frequency to a corresponding local temperature. The frequency to temperature converter (FTC) equation can be obtained by regression analysis. In this experiment, the heater based on the LUT oscillator (Fig. 9) is applied to heat up the FPGA while reading out the sensor counts by the MicroBlaze in a fixed 40 µs sample period and simultaneously measuring the on-chip temperature, as a reference for the measurement, and FPGA core voltage values, which both are provided by the built-in hard-sensors using system monitor IP core. Then, various polynomial fitting models are applied, for instance via curve fitting in MATLAB, to find an effective FTC function.

We find that a first-order polynomial function cannot provide a good fit because, in contrast to a high-voltage FPGA core, the frequency response of an RO is a nonlinear function in modern FPGAs, as noted in Section 4. Instead, a second-order polynomial model provides a good fit. Further increase in polynomial order does not improve notably the fitting error. Due to intra-die variation, it is necessary to consider delay variation within the chip. Since the frequency of an RO at location (*x*,*y*) is a function of not only the local temperature but also the supply voltage [7,12], the FTC function (calibration equation) should be written as two-variable quadratic polynomial as Eq. (22):

$$T(x, y) = c_1 f^2(x, y) + c_2 V_{DD}^2 + c_3 f(x, y) \times V_{DD} + c_4 f(x, y) + c_5 V_{DD} + c_6$$
(22)

where f(x,y) is the frequency of the RO at location (x,y), T(x,y) is the local temperature,  $V_{DD}$  is the core voltage, and  $c_i$  is the calibration coefficient, which can be calculated by the given initial temperatures and their corresponding frequencies of the RO. As compared with a linear (first-order) model that provides a root mean square error (RMSE) of 3.99 °C in temperature estimation, the second-order polynomial function reduces the RMSE by 2.66 °C for our sensor and provides an RMSE of 1.33 °C. After this initial calibration, the sensor could translate its oscillation frequency to its corresponding local temperature. Note that, the temperature measured around a particular soft-sensor at every location of the chip does not exactly represent the temperature of the built-in sensor, results in estimation error in the calibration process. To address this issue, the proposed sensor is located in the middle of the chip, the closest point to the built-in hard-sensor, in order to minimize the temperature measurement error and increase the calibration precision. Therefore, the sensor is manually placed at location (x,y)=(28,58) using placement directives in the user constraints file (UCF).

Table 4 lists the calibration coefficients  $c_i$ , the coefficient of determination  $R^2$ , and the *RMSE* of the FTC equation. In this experiment, the temperature is swept from 5 to 90 °C. Table 5 compares the temperature measurement range of the proposed sensor with previous designs. The proposed sensor can operate over the

| Та | able | e 4 |    |  |  |
|----|------|-----|----|--|--|
| ~  | 1.1  |     | 60 |  |  |

| Calibration coefficients' value of the FTC equat |
|--------------------------------------------------|
|--------------------------------------------------|

| Parameter      | Value     |
|----------------|-----------|
| C1             | -0.2564   |
| C <sub>2</sub> | -3.861e+4 |
| $C_3$          | 189.3     |
| $C_4$          | 6.15      |
| $C_5$          | 4152      |
| C <sub>6</sub> | -2722     |
| $\mathbb{R}^2$ | 0.9996    |
| RMSE           | 1.33      |

Comparison of temperature measurement range.

| Design   | FPGA          | Process (nm) | Temperature range (°C) |
|----------|---------------|--------------|------------------------|
| [36]     | XC3000        | 600+         | 20~100                 |
| [37]     | XC4000        | 600+         | 20~120                 |
| [25]     | Virtex-II Pro | 130          | 35~60                  |
| [38]     | Virtex-II     | 180          | 50~125                 |
| [22]     | ACEX 1K       | 220          | -40~130                |
| [35]     | Virtex-4      | 90           | 34~79                  |
| [39]     | Virtex-5      | 65           | 32~60                  |
| [7]      | Virtex-5      | 65           | 0~85                   |
| [9]      | Cyclone III   | 65           | 20~90                  |
| [8]      | Spartan 3E    | 90           | 0~70                   |
| [20]     | Virtex-5      | 65           | -20~100                |
| [15]     | Spartan 3     | 90           | 10~70                  |
| [23]     | Cyclone IV    | 60           | 20~80                  |
| Proposed | Virtex-5      | 65           | 5~90                   |

commercial temperature range (i.e. 0-85 °C [21]) with a very good approximation.

7.2.1.3. Sensitivity. One of the most interesting parameters about temperature sensors is the sensitivity, meaning how much the RO's oscillation frequency/counter value changes over the range of operating temperature. Fig. 11 plots the frequency response of the proposed RO for a range of temperature from 5 to 90 °C. As seen, the curve shows a nonlinear response to temperature variations. According to (18), the sensitivity of the proposed sensor is about 394 kHz/°C. Fig. 12 plots the oscillation frequency dependence on supply voltage variation from 0.95V to 1.05 V. The plot demonstrates how core voltage variations, i.e. due to changes in the workload, can affect the amount of the RO frequency even at a constant temperature (i.e. ambient temperature). Therefore, as noted, the power supply voltage  $V_{DD}$  must be considered as a variable in the calibration equation (i.e. Eq. (22)).

Table 6 compares the sensitivity of the proposed sensor with other designs. As seen in the table and also discussed in [7], as voltage scaling and transistors' feature size shrinks, the dependency of the RO frequency on temperature variation becomes drastically less, results in a bottleneck for the RO-based sensor's designs. For instance, in [36] and [37] the authors report 0.35% decrease of the RO frequency per 1 °C increase. Also, the authors in [35] publish 0.11%/°C reduction of the RO frequency with 90 nm feature size of the Virtex-4 FPGA. The sensitivity of the RO-based temperature sensor is decreased to 0.032%/°C for 65 nm technology [7]. Our proposed sensor increases the dependency of the RO frequency on temperature to 0.098%/°C. The experimental result confirms that the proposed design has a stronger temperature dependence than previous designs, as can be observed in Table 6. Due to the novel design of the RO, which results in bigger



Fig. 11. The frequency response of the proposed RO vs. temperature for a range of 5-90 °C.



Fig. 12. The proposed sensor frequency dependence on supply voltage variations from 0.95 V to 1.05 V.

Table 6

Comparison results of sensor sensitivity.

| Design   | FPGA          | Process (nm) | Supply voltage<br>(V) | Sensitivity (%/°C) |
|----------|---------------|--------------|-----------------------|--------------------|
| [36]     | XC3000        | 600+         | 5                     | 0.35               |
| [37]     | XC4000        | 600+         | 5                     | 0.35               |
| [25]     | Virtex-II Pro | 130          | 1.5                   | 0.21               |
| [38]     | Virtex-II     | 180          | 1.5                   | 0.15               |
| [22]     | ACEX 1K       | 220          | 2.5                   | 0.12               |
| [35]     | Virtex-4      | 90           | 1.2                   | 0.11               |
| [39]     | Virtex-5      | 65           | 1                     | 0.036              |
| [7]      | Virtex-5      | 65           | 1                     | 0.032              |
| Proposed | Virtex-5      | 65           | 1                     | 0.098              |

oscillation frequency changes, and also instantiating an open latch along with each CFGLUT5 element, the sensitivity of the proposed sensor is boosted up to 2.72 times higher than the already best sensitive design (i.e. [7]) with the same specifications (i.e. core voltage and process technology).

This is very important that with respect to the maximum oscillation frequency of the RO, the counter works reliably during operation. Hence, it is mandatory that the counter does not overflow in such high frequency. In order to study the operation of the proposed sensor, let the Eq. (19) to be rewritten as

$$count_{\max} = f_{\max} \times t_s \tag{23}$$

According to (23), with respect to the maximum frequency of the proposed RO (i.e.  $f_{max}$ =401.8 MHz) and the sampling period of  $t_s$ =40 µs, the maximum count value will be obtained as

$$count_{max} = (401.8 \times 10^{6}) \times (40 \times 10^{-6}) = 16072$$
 (24)

According to (24), it is clear that the utilized RNS ring counter with the maximum count of 32,736 would be enough for the sensor for reliable operation in a frequency of about 400 MHz.

# 7.2.2. Exploring the temperature sensor design in the network with constant RO length and counter width

To evaluate the relative performance of different configurations of the sensor in the network, four experiments are performed for each design. In these experiments, only different configurations of the RObased temperature sensor are compared in terms of useful basic evaluation metrics (i.e. area, thermal, and power overheads, and thermal map error). In all experiments, the length of the RO (i.e. 3 stages), the width of the counter (i.e. 15-bit binary counter or 3-moduli set RNS ring counter), and the number of sensors (i.e. 35 sensors) are assumed to be constant.

7.2.2.1. Area overhead. We implement each design and compare its area overhead to other designs. Fig. 13 compares the area overhead of 12 designs of the RO-based temperature sensor. Because of heavy



Fig. 13. Comparison of area overhead of 12 RO-based temperature sensor designs in the examined network on a Virtex-5 FPGA.

utilization of binary counters, for each RO design, the area overhead of examined sensor networks that consist binary counters is 910 units more than the RNS ring counters. Also, when the ROs include open latches (i.e. XORCY & LD, INV & LD, and CFGLUT5 & LD), the sensor network occupies more resources. For instance, the area overhead of the sensor network composed of INV & LD and RNS ring counter is about 17% more than corresponding design without latches (i.e. INV and RNS ring counter). Moreover, due to implementing the 3-stage RO with two LUTs, which is realized using the CFGLUT5 elements, the networks of sensors composed of these primitive cells (i.e. CFGLUT5 or CFGLUT5 & LD) occupy less resources compared to other RO designs for a particular counter design. As an example, the sensor network based on the proposed sensor (i.e. CFGLUT5 & LD and RNS ring counter) occupies 70 resources (one LUT and one latch for each of 35 ROs) less than the sensor network comprising of INV & LD and RNS ring counter. It should be noted that, since the aim of the paper is to compare different designs of the sensor relative to each other in a network, the resources occupied by shared components (i.e. MicroBlaze soft-processor core, Timer, System Monitor, and UART IP cores) have not taken into account in the  $A_{OH}$  calculation. However, since only the bus interface of the sensor network with the PLB varies from one design to another, which utilizes its own LUTs and FFs, the resource utilization of the associated PLB has been considered in the area overhead.

Note that, an *n*-bit binary counter, which can count up to  $2^{n}$ , occupies *n* LUTs and *n* FFs. Therefore, as compared to a 15-bit binary counter, which utilizes 15 LUTs and 15 FFs and can count up to  $2^{15}$ =32768, the 3-moduli set RNS ring counter with almost the same maximum count, i.e.  $31\times32\times33=32736$ , occupies only 3 LUTs and 1 FF, meaning 5 times less LUTs and 15 times less FFs. More detailed information on the resource utilization of different designs of the RO-based temperature sensor is listed in Table 7, which is obtained by

Table 7

| Comparison of resource utilization of twelve sensor | designs in the examined network. |
|-----------------------------------------------------|----------------------------------|
|-----------------------------------------------------|----------------------------------|

| Design  |                                                                  | Number of occupied<br>LUTs |                          | Number of occupied FFs/<br>Latches |                          |
|---------|------------------------------------------------------------------|----------------------------|--------------------------|------------------------------------|--------------------------|
| Counter | RO                                                               | Sensor<br>network          | PLB                      | Sensor<br>network                  | PLB                      |
| Binary  | XORCY/ INV<br>CFGLUT5<br>XORCY & LD/<br>INV & LD<br>CFGLUT5 & LD | 630<br>595<br>630<br>595   | 197<br>197<br>197<br>197 | 525<br>525<br>630<br>595           | 182<br>182<br>182<br>182 |
| RNS     | XORCY/ INV<br>CFGLUT5<br>XORCY & LD/<br>INV & LD<br>CFGLUT5 & LD | 210<br>175<br>210<br>175   | 197<br>197<br>197<br>197 | 35<br>35<br>140<br>105             | 182<br>182<br>182<br>182 |



Fig. 14. The map of conventional ROs' frequency variations due to intra-die variation of a Virtex-5 FPGA in the thermal equilibrium at  $T_i$  =30 °C.

using the Xilinx tool (i.e. PlanAhead) after PAR process. Note that, each component, i.e., sensor network includes a bus interface with the PLB in order to connect to the MicroBlaze soft processor, which utilizes a few LUTs and FFs, as can be seen in Table 7. For all of the sensor networks, the PLB consumes 197 additional LUTs and 182 FFs.

7.2.2.2. Thermal overhead. In a network of RO-based temperature sensors, the thermal overhead depends normally on the oscillation frequencies of ROs and resources occupied by counters. Basically, because of intra-die variation in new fabrication technologies, even in a constant temperature (e.g. room/ambient temperature), a specific RO may have its own oscillation frequency in each location of the chip. Hence, in a network of temperature sensors, each sensor may have its own value in a thermal equilibrium condition. Fig. 14 shows the frequency profile of the conventional RO (i.e. INV) for a Virtex-5 FPGA in an idle mode at junction temperature of  $T_i=30$  °C. The X-slice and Yslice coordinates represent the slice locations of the examined network, which is partitioned into regular grids of 8×9 tiles containing 56×90 slices. It can be seen that the left side of the chip is noticeably faster than the right side due to the intra-die variation. For the chip under test, the difference of the RO frequency between the slowest (i.e. fmin=307.12 MHz) and the fastest (i.e. fmax=352.90 MHz) regions reaches over 45 MHz. This variation results in the different amount of heat produced by each sensor and thus thermal overhead as well as power consumption. Therefore, in order to provide accurate results, we should consider the mean of oscillation frequencies of ROs in the calculation of the thermal as well as the power overhead of a sensor network. Fig. 15 demonstrates the average oscillation frequency of six RO designs in the examined network. As seen, since each element (i.e. XORCY, XORCY & LD, INV, etc.) has a certain propagation delay. each design inherently has its own frequency.

![](_page_12_Figure_13.jpeg)

Table 8 compares the thermal overhead of 12 designs of the sensor

Fig. 15. Average oscillation frequency of 35 three-stage ROs in the sensor network with six different designs.

Comparison results of thermal and power overheads of twelve sensor designs in the examined network (normalized).

| Design  |                                | T <sub>OH</sub> | $\mathbf{P}_{\mathrm{OH}}$ |
|---------|--------------------------------|-----------------|----------------------------|
| Counter | RO                             |                 |                            |
| Binary  | XORCY                          | 0.95            | 0.97                       |
|         | XORCY & LD                     | 0.93            | 0.95                       |
|         | INV                            | 1.00            | 1.00                       |
|         | INV & LD                       | 0.98            | 0.99                       |
|         | CFGLUT5                        | 1.09            | 1.07                       |
|         | CFGLUT5 & LD (Proposed RO)     | 1.06            | 1.05                       |
| RNS     | XORCY                          | 0.87            | 0.70                       |
|         | XORCY & LD                     | 0.83            | 0.69                       |
|         | INV                            | 0.95            | 0.74                       |
|         | INV & LD                       | 0.91            | 0.72                       |
|         | CFGLUT5                        | 1.05            | 0.80                       |
|         | CFGLUT5 & LD (Proposed sensor) | 1.01            | 0.78                       |

in the examined network. Note that, the results of each metric are normalized with regard to the conventional design (i.e. INV-based RO and binary counter). The networks of sensors composed of binary counters occupy more resources than RNS ring counters, and hence, dissipate more power, result in larger thermal overhead. As an example, compared to the network of sensors composed of INV and RNS ring counter, the conventional design of the sensor network (i.e. INV and binary counter) produces 5.23% more heat. Besides that, for a specific type of the counter, when the oscillation frequency of ROs increases, the thermal overhead of such network increases. For instance, the thermal overhead of the network based on the proposed sensor (i.e. CFGLUT5 & LD and RNS ring counter) is 21.7% more than the network constructed using XORCY & LD for ROs and RNS ring counters, which has the lowest thermal overhead.

7.2.2.3. Power overhead. Similarly to previous notes, stated in Section 7.2.2.2, the resources occupied by the counters as well as the ROs' oscillations dissipate power in a sensor network, results in power overhead. Table 8 compares the normalized power overhead of 12 designs of the sensor in the examined network. Indeed, according to Eqs. (15) and (16), the higher the oscillation frequency of ROs is, the more the dynamic power consumption, the more the power overhead of the sensor network, as illustrated in Fig. 16. For instance, for the sensor network based on the binary counter, the power consumption difference between the fastest (i.e. CFGLUT5) and the slowest (i.e. XORCY & LD) ROs is about 12.6%. Moreover, when the sensor network includes a specific RO design, the more the resource utilization of the counters is, the more the power overhead. For example, in the CFGLUT5-based RO, the power overhead of the sensor network comprising binary counters is 33.7% more than RNS ring counters. In summary, as can be seen in Fig. 16 and Table 8, the higher the ROs' frequencies as well as the more the resources occupied by the counters

![](_page_13_Figure_6.jpeg)

Fig. 16. Relationship between the RO frequency as well as the counter design and the power overhead of the sensor network.

are, the higher the power dissipation is, the more the power overhead, the higher the heat generated by the sensor network, and thus the more the thermal overhead, and vice versa.

7.2.2.4. Thermal map error. In order to evaluate the thermal map error of various designs, each network first needs to be calibrated and then, compared to the reference network. Due to intra-die and intrasensor variations in an array of temperature sensors, we need to calibrate the individual sensors, at least in two equilibrium temperatures. In order to do so, at the first point, the FPGA is put in the idle mode until it reaches to thermal equilibrium, i.e. the ambient temperature at 30 °C for our experiment, which is confirmed by the built-in hard-sensor. In the second phase, the LUT-FF-based pipeline heater (Fig. 10) is utilized in order to heat up the chip. At this point, long enough time is spent until thermal equilibrium is reached, which is reported by the Xilinx system monitor. After gathering the data, i.e. sensors' counts and hard-sensors' values, the FTC function for each sensor is determined. Next, the aforementioned steps are repeated for each design.

Now, the reference network is required to be constructed in order to compare the thermal profile accuracy of various designs. Since the sensitivity, and hence, the accuracy of the proposed RO (i.e. CFGLUT5 & LD) is higher than other designs, it is utilized in the reference network in order to make it more accurate. Note that, the thermal map error of a sensor network and also the sensor's sensitivity depend on only the RO design, not the counter. The reason is that the RO is the temperature-sensitive circuit and the capture counter just counts its oscillations for a fixed time interval (e.g. 40 µs). Therefore, the proposed RO along with a binary counter is used in the reference network because the binary counter is easy to realize and is extensively utilized in other works as well. Hence, in this experiment, the counter design (i.e. 15-bit binary counter) is fixed and only the effect of various RO designs is taken into account. Also, in this work, the reference sensor network has two times more sensors (m=2), i.e. 70 sensors, than examined sensor networks in order to make it more accurate. The layout of the reference and examined sensor networks are shown in Fig. 17a and b, respectively. In order to have a fair comparison, the floorplan as well as the placement of each sensor are exactly fixed in different experiments, i.e. using LOC constraint, and only the RO design is changed.

Table 9 compares six RO designs in the examined sensor network in terms of the thermal map error as well as the sensitivity. As seen, the higher the sensitivity of the sensor is, the lower the thermal map error, and hence, the more the accuracy of thermal profiling. For instance, the thermal map error of the sensor network based on the proposed RO is 10.2% less than the CFGLUT5-based RO. The reason is that instantiating an open latch along with each inverting element improves the sensor's sensitivity, hence, the measurement accuracy. As another example, the sensitivity and the thermal map error of the network of sensors composed of ROs with INV & LD configuration are 8.96% and 10.61% better than the conventional RO without latches (i.e. INV), respectively. The thermal map of the chip under test provided by reference network and examined sensor network that consists of ROs with INV & LD configuration is shown in Fig. 18.

7.2.3. Exploring the temperature sensor design in the network for various RO lengths and counter widths

In these experiments, we plan to explore the RO length as well as the counter width in a network of conventional RO-based temperature sensor and compare them with each other in order to study the influences of the RO length/counter width on the basic evaluation metrics. For this goal, firstly, the counter width should be constant, i.e. 14-bit binary counter or 3-moduli set RNS ring counter, in order to

![](_page_14_Figure_2.jpeg)

Fig. 17. The floorplan of the experimented sensors' placement: (a) the examined sensor network (35 sensors) and (b) the reference sensor network (70 sensors) as shown from Xilinx PlanAhead tool.

Comparison results of sensitivity and thermal map error of the sensor network for six RO designs.

| RO design               | Sensitivity (KHz/°C) | Tmap <sub>Error</sub> (°C) |
|-------------------------|----------------------|----------------------------|
| XORCY                   | 240.0                | 1.73                       |
| XORCY & LD              | 257.6                | 1.55                       |
| INV                     | 262.3                | 1.46                       |
| INV & LD                | 285.8                | 1.32                       |
| CFGLUT5                 | 357.6                | 1.17                       |
| CFGLUT5 & LD (Proposed) | 394.1                | 1.05                       |

investigate only the influences of the RO length. Therefore, at this point, the length of the conventional RO is explored from 3 to 31 stages because typically designers use between 3 and 31 inverting elements in the RO chain [7,8,10,12,13,20,25]. Also, the reason for selecting the aforementioned counter width (i.e. 14-bit binary counter and 3-moduli set RNS ring counter) is that, with respect to the oscillation frequency of the 3-stage RO, which has the maximum frequency among other RO lengths (see Fig. 19), the counter must work reliably during operation, as mentioned in Section 7.2.1.3. Note that, in the previous experiments discussed in Section 7.2.2, the average oscillation frequency of the RO based on the CFGLUT5 primitive cell is about 394 MHz and due to intra-die variation, the maximum oscillation frequency of some ROs can reach above 420 MHz. Therefore, in the previous experiments, the 15-bit binary counter/3-moduli set RNS ring counter is utilized.

Fig. 19 demonstrates the average oscillation frequency of 35 conventional designs of the RO in the examined network for the lengths varying from 3 to 31. As seen, the longer the RO stage is, the lower the oscillation frequency (see Eq. (1)). For instance, the oscillation frequency of the 5-stage RO is 14.4% lower than the 3-stage RO.

The experimental results of 30 different RO-based temperature sensors for variant RO length (i.e. 15 designs) and two counter designs are listed in Table 10, in terms of the area, thermal, and power overheads as well as the thermal map error. Note that, the results of each metric are normalized with regard to the design composed of the conventional binary counter and the most-utilized stage of the RO (i.e. 7-stage) [5,8,10,25]. According to Eq. (13), for both of the counter designs, when the length of the RO increases, as expected, the area overhead of the sensor network increases linearly, i.e. 70 LUTs for each 2 stages increment. As an example, for a specific counter design (i.e. binary counter), the area overhead of the sensor network composed of 5-stage ROs is 5.2% higher than the sensor network constructed using such counter and 3-stage ROs. Also, similarly to the previous experiments discussed in Section 7.2.2.1, for each RO stage, the area overhead of the sensor network composed of the RNS ring counter is less than the binary counter due to less resources occupied. As seen, the resource utilization of the sensor network based on the RNS ring counter is an average 81.8% less than the binary counter.

The thermal overhead of different designs of the RO-based temperature sensor in the examined network for variant RO length is reported in Table 10. Indeed, the longer the RO stage is, the lower the oscillation frequency, and then, the less the  $T_{OH}$ . As an example, compared to the 3-stage RO, the network of sensors composed of 5-stage ROs and binary counters produces 10% less heat. Besides that, the sensor networks composed of binary counters occupy more resources than RNS ring counters, result in larger thermal overhead. For instance, the thermal overhead of the sensor network based on the 3-stage RO and the 14-bit binary counter is 5.3% higher than the network of temperature sensors composed of the RNS ring counter and such RO length (i.e. 3-stage).

Similarly to thermal overhead metric, the longer the RO stage is, the

![](_page_14_Figure_12.jpeg)

Fig. 18. Thermal map of the chip provided by (a) reference network and (b) examined sensor network composed of the ROs with INV & LD configuration.

![](_page_15_Figure_2.jpeg)

Fig. 19. Average oscillation frequency of 35 conventional ROs in the sensor network for variant RO length from 3 to 31.

Relative-performance evaluation results of 30 RO-based temperature sensor designs in the network for constant counter width and variant RO length in terms of the area overhead (Eq. (13)), the thermal overhead (Eq. (14)), the power overhead (Eq. (16)) and the thermal map error (Eq. (20)) (normalized).

| Design    |                            | A <sub>OH</sub> |                | T <sub>OH</sub> |                | P <sub>OH</sub> |                | Tmap <sub>Error</sub> |                |
|-----------|----------------------------|-----------------|----------------|-----------------|----------------|-----------------|----------------|-----------------------|----------------|
| RO length | Binary counter width (bit) | RNS counter     | Binary counter | RNS counter     | Binary counter | RNS counter     | Binary counter | RNS counter           | Binary counter |
| 3         | 14                         | 0.389           | 0.913          | 1.126           | 1.185          | 0.875           | 1.162          | 0.452                 | 0.452          |
| 5         | 14                         | 0.433           | 0.956          | 1.037           | 1.067          | 0.814           | 1.097          | 0.663                 | 0.663          |
| 7         | 14                         | 0.476           | 1.000          | 0.941           | 1.000          | 0.718           | 1.000          | 1.000                 | 1.000          |
| 9         | 14                         | 0.520           | 1.044          | 0.889           | 0.948          | 0.624           | 0.908          | 1.136                 | 1.136          |
| 11        | 14                         | 0.564           | 1.087          | 0.837           | 0.881          | 0.558           | 0.841          | 1.173                 | 1.173          |
| 13        | 14                         | 0.607           | 1.131          | 0.785           | 0.837          | 0.496           | 0.779          | 1.232                 | 1.232          |
| 15        | 14                         | 0.651           | 1.175          | 0.741           | 0.807          | 0.445           | 0.727          | 1.279                 | 1.279          |
| 17        | 14                         | 0.695           | 1.218          | 0.711           | 0.770          | 0.405           | 0.688          | 1.294                 | 1.294          |
| 19        | 14                         | 0.738           | 1.262          | 0.659           | 0.696          | 0.378           | 0.660          | 1.316                 | 1.316          |
| 21        | 14                         | 0.782           | 1.305          | 0.593           | 0.652          | 0.353           | 0.635          | 1.347                 | 1.347          |
| 23        | 14                         | 0.825           | 1.349          | 0.563           | 0.615          | 0.331           | 0.604          | 1.424                 | 1.424          |
| 25        | 14                         | 0.869           | 1.393          | 0.533           | 0.585          | 0.306           | 0.583          | 1.489                 | 1.489          |
| 27        | 14                         | 0.913           | 1.436          | 0.511           | 0.563          | 0.298           | 0.573          | 1.533                 | 1.533          |
| 29        | 14                         | 0.956           | 1.480          | 0.496           | 0.556          | 0.294           | 0.569          | 1.567                 | 1.567          |
| 31        | 14                         | 1.000           | 1.524          | 0.496           | 0.556          | 0.294           | 0.566          | 1.585                 | 1.585          |

less the dynamic power consumption, hence, the less the power overhead of the sensor network. As an example, the sensor network constructed using the RNS ring counter and 7-stage RO consumes 11.1% less power than the 5-stage RO. Moreover, for a specific RO length, the more the resource utilization of the counter is, the more the  $P_{OH}$ . For instance, the power overhead of the sensor network based on the 7-stage and 9-stage ROs and the binary counter is 38.9% and 46.7% more than the network of sensors composed of such RO length, i.e. 7- and 9-stage, and the RNS ring counter, respectively.

Note that, one of the important conclusions that can be achieved with respect to the experimental results shown in Table 10 is that, although by reducing the RO length, the sensor occupies less resources, and hence, it is traditionally expected the lower  $T_{OH}$  as well as  $P_{OH}$ , but, both of the thermal and power overheads increase due to higher oscillation frequency of the shorter RO. In other words, the  $T_{OH}$  and the  $P_{OH}$  of a sensor network are dominated by the ROs' oscillation frequency, not resources occupied by the ROs.

Fig. 20 plots the relationship between the sensor sensitivity and the

![](_page_15_Figure_10.jpeg)

Fig. 20. Relationship between the sensitivity and the thermal map error of the examined sensor network for different RO stages.

thermal map error of the examined sensor network for different RO stages. As seen, by increasing the RO length, the oscillation frequency changes of the RO over the range of operating temperature, i.e., the sensor sensitivity decreases and hence, the thermal map error increases. For instance, independent of the counter design, the thermal map error of the sensor network composed of the 5-stage RO is 46.6% more than the 3-stage RO. As another example, the thermal map error of the network of sensors constructed using 7-stage ROs is 51.5% worse than 5-stage ROs. In summary, based on the experimental results presented in Table 10, the shorter the RO stage, i.e., the bigger the RO's frequency changes, the more the sensor sensitivity, the lower the thermal map error, the lower the area overhead, the more the thermal overhead, and the more the power overhead, and vice versa.

At this point, in order to investigate the effects of the counter width on relative-performance evaluation metrics, we can only explore the width of the binary counter, not the RNS ring counter. Note that, as mentioned, the maximum count value of the RNS ring counter is obtained by multiplying all SRLs' length. Therefore, by utilizing two 31and 32-bit SRLs, the maximum count value reaches 992 which is clearly much less than the value required to work the 2-moduli set RNS ring counter reliably during operation with respect to even the frequency of the 31-stage RO, which has the minimum oscillation frequency among others in our design space. Hence, it is necessary to utilize the 3-moduli set RNS ring counter due to the implementation restrictions, which results in constant width of the RNS ring counter during the DSE. However, we can explore the width of the binary counter, of course not for any RO lengths. With regard to the oscillation frequency of 13- to 31-stage ROs and according to Eq. (23), the 13-bit binary counter (not less width) can be utilized instead of 14-bit, meaning that 1 LUT and 1 FF are saved for each sensor in the network. Note that, the binary counter implemented with less width, i.e., 12-bit binary counter cannot be used for the 31-stage RO, hence, obviously not for shorter stages as well. Because, the frequency of some ROs reaches over 105 MHz in fast regions of the chip due to intra-die variation of the chip, which results in counter overflow and unreliability during operation phase. Therefore, in this case, the influences of 13- and 14-bit binary counters on the evaluation metrics are investigated.

Table 11 lists the comparison results of exploration of the binary counter width (i.e. 14-bit and 13-bit) for variant RO stages. As it can be seen, since only one bit of the binary counter is reduced, trivial improvement can be observed on the evaluation criteria, i.e.  $A_{OH}$ ,  $T_{OH}$  and  $P_{OH}$ . However, it is clear that for more compact counter, although insignificant, the  $A_{OH}$ , the  $T_{OH}$ , and the  $P_{OH}$  of the sensor network decrease. For instance, based on the experimental results

presented in Tables 10 and 11, as compared to the sensor network composed of 13-stage ROs and 13-bit binary counters, the  $A_{OH}$ , the  $T_{OH}$ , and the  $P_{OH}$  are 4.05%, 0.84% and 3.6% less than the sensor network that consists of ROs with the same length (i.e. 13-stage) but 14-bit binary counters, correspondingly.

### 7.2.4. Quality factor

The relative-performance evaluation results of various sensor designs, presented in Sections 7.2.2 and 7.2.3, confirm that there is no unique design that has absolute minimum value of all basic evaluation criteria together due to the trade-offs between these metrics. Therefore, an impartial comparison is not possible based on these individual metrics (i.e. area, thermal, and power overheads, and thermal map error). In this section, based on the QF metric, the efficiency of different sensor designs is evaluated and compared with each other and then, based on this metric, the best design is introduced.

7.2.4.1. Quality factor of the temperature sensor network for constant RO length and counter width. The QF value of twelve configurations of the sensor in the network, calculated using Eq. (21), for constant RO length (i.e. 3-stage) as well as counter width is listed in Table 12. Note that, the values of evaluation metrics are used in order to calculate the QF, which are listed as the survey of evaluation results in Table 13.

As seen in Table 12, the first explicit result is that, for each RO configuration the efficiency of the sensor network comprising the RNS ring counter is better than the binary counter. For instance, the efficiency of the sensor network based on the proposed sensor (i.e. CFGLUT5 & LD and RNS ring counter), which has the best efficiency (i.e. QF=4.075) among other configurations, is 3.352 times better than the sensor network composed of such RO (i.e. CFGLUT5 & LD) and binary counter. The QF ratio of the sensor network based on the RNS ring counter to the binary counter is shown in Fig. 21 for different RO configurations. The efficiency of the network of sensors comprising the RNS ring counters is an average 3.49 times higher than the binary counters. This can be explained by the fact that firstly, in a sensor network, the RNS ring counter occupies less area, consumes less power, and hence, generates less heat compared to the binary counter and secondly, the thermal map error of a sensor network is not dependent on the counter design, as illustrated in previous sections. Therefore, for each RO design, the sensor network comprising the RNS ring counter is always more efficient compared to the alternative counter design, i.e. binary counter.

The second observation is that, regardless of the counter design, the

Table 11

Relative-performance evaluation results of 30 RO-based temperature sensor designs in the network for variant binary counter width in terms of the area overhead (Eq. (13)), the thermal overhead (Eq. (14)), the power overhead (Eq. (16)) and the thermal map error (Eq. (20)) (normalized).

| Design    |                            | A <sub>OH</sub> |                | T <sub>OH</sub> |                | Рон         |                | Tmap <sub>Error</sub> |                |
|-----------|----------------------------|-----------------|----------------|-----------------|----------------|-------------|----------------|-----------------------|----------------|
| RO length | Binary counter width (bit) | RNS counter     | Binary counter | RNS counter     | Binary counter | RNS counter | Binary counter | RNS counter           | Binary counter |
| 3         | 14                         | 0.389           | 0.913          | 1.126           | 1.185          | 0.875       | 1.162          | 0.452                 | 0.452          |
| 5         | 14                         | 0.433           | 0.956          | 1.037           | 1.067          | 0.814       | 1.097          | 0.663                 | 0.663          |
| 7         | 14                         | 0.476           | 1.000          | 0.941           | 1.000          | 0.718       | 1.000          | 1.000                 | 1.000          |
| 9         | 14                         | 0.520           | 1.044          | 0.889           | 0.948          | 0.624       | 0.908          | 1.136                 | 1.136          |
| 11        | 14                         | 0.564           | 1.087          | 0.837           | 0.881          | 0.558       | 0.841          | 1.173                 | 1.173          |
| 13        | 13                         | 0.607           | 1.087          | 0.785           | 0.830          | 0.496       | 0.752          | 1.232                 | 1.232          |
| 15        | 13                         | 0.651           | 1.131          | 0.741           | 0.800          | 0.445       | 0.701          | 1.279                 | 1.279          |
| 17        | 13                         | 0.695           | 1.175          | 0.711           | 0.763          | 0.405       | 0.662          | 1.294                 | 1.294          |
| 19        | 13                         | 0.738           | 1.218          | 0.659           | 0.689          | 0.378       | 0.635          | 1.316                 | 1.316          |
| 21        | 13                         | 0.782           | 1.262          | 0.593           | 0.644          | 0.353       | 0.608          | 1.347                 | 1.347          |
| 23        | 13                         | 0.825           | 1.305          | 0.563           | 0.607          | 0.331       | 0.578          | 1.424                 | 1.424          |
| 25        | 13                         | 0.869           | 1.349          | 0.533           | 0.578          | 0.306       | 0.558          | 1.489                 | 1.489          |
| 27        | 13                         | 0.913           | 1.393          | 0.511           | 0.556          | 0.298       | 0.548          | 1.533                 | 1.533          |
| 29        | 13                         | 0.956           | 1.436          | 0.496           | 0.548          | 0.294       | 0.543          | 1.567                 | 1.567          |
| 31        | 13                         | 1.000           | 1.480          | 0.496           | 0.548          | 0.294       | 0.542          | 1.585                 | 1.585          |

# N. Rahmanikia et al.

#### Table 12

Comparison of the efficiency of twelve sensor designs in the network for six RO configurations and two counter designs.

| Quality factor (using Eq. (21)) |                  |                |  |  |  |
|---------------------------------|------------------|----------------|--|--|--|
| Design                          |                  |                |  |  |  |
| RO design                       | RNS ring counter | Binary counter |  |  |  |
| XORCY                           | 3.408            | 0.919          |  |  |  |
| XORCY & LD                      | 3.498            | 0.993          |  |  |  |
| INV                             | 3.512            | 1.000          |  |  |  |
| INV & LD                        | 3.557            | 1.062          |  |  |  |
| CFGLUT5                         | 3.853            | 1.101          |  |  |  |
| CFGLUT5 & LD                    | 4.075            | 1.216          |  |  |  |

QF value of the network of sensors composed of ROs that include latches (i.e. XORCY & LD, INV & LD, and CFGLUT5 & LD) is higher than such corresponding ROs without latches, i.e. XORCY, INV, and CFGLUT5. As an example, the efficiency of the sensor network constructed using the proposed RO (i.e. CFGLUT5 & LD) and conventional binary counter (i.e. QF=1.216) is 10.4% higher than such corresponding RO without latches (i.e. CFGLUT5). This can be explained that as compared with the ROs without latches, although the ROs that include open latches occupy more area in a sensor network, but, due to lower oscillation frequency of these RO's configurations (i.e. XORCY & LD, INV & LD, and CFGLUT5 & LD), the sensor networks that consist of such RO have lower thermal as well as power overhead. In addition, as noted, the thermal map error of these networks is lower due to utilizing a latch in the open state along with each inverting element. Therefore, utilizing open latches in the RO chain improves the efficiency of an RO-based temperature sensor, which results in more efficiency/QF value of the sensor network.

7.2.4.2. Quality factor of the temperature sensor network for variant RO length and counter width. Table 14 lists the QF value of 30 designs of the RO-based temperature sensor in the network for variant RO length (i.e. 3 - to 31 - stage) and two counter designs with constant width. Regardless of the RO length, the sensor network comprising the RNS ring counter is more efficient than the binary counter due to lower  $A_{OH}$  as well as  $T_{OH}$  and  $P_{OH}$ . For instance, the efficiency of the sensor networks based on the RNS ring counters and 3 - and 5-stage ROs is 3.28 and 3.06 times higher than the network of sensors based on the binary counter, respectively. In other words, the first explicit result presented in Table 14 is that, the efficiency of the sensor network composed of the RNS ring counter is an average 3.21 times higher than the binary counter.

The second important observation is related to the variation of the QF values with increase of the RO length. As seen in Table 14, for a constant width of the counter, the sensor networks that consist of the 3-stage and 9-stage ROs have the highest and the least QF value among others, respectively. By increasing the RO length from 3-stage up to 9-

![](_page_17_Figure_8.jpeg)

Fig. 21. QF ratio of the sensor network based on the RNS ring counter to the binary counter for six RO configurations.

### Table 14

Quality factor (using Eq. (21))

Comparison of the efficiency of 30 sensor designs in the examined network for variant RO length from 3 to 31 (normalized).

| Design (for variant binary counter width) |               |                  |                |  |  |  |  |  |
|-------------------------------------------|---------------|------------------|----------------|--|--|--|--|--|
| RO length                                 | Counter width | RNS ring counter | Binary counter |  |  |  |  |  |
| 3                                         | 14            | 5.775            | 1.760          |  |  |  |  |  |
| 5                                         | 14            | 4.133            | 1.349          |  |  |  |  |  |
| 7                                         | 14            | 3.109            | 1.000          |  |  |  |  |  |
| 9                                         | 14            | 3.050            | 0.980          |  |  |  |  |  |
| 11                                        | 14            | 3.237            | 1.058          |  |  |  |  |  |
| 13                                        | 14            | 3.432            | 1.101          |  |  |  |  |  |
| 15                                        | 14            | 3.649            | 1.134          |  |  |  |  |  |
| 17                                        | 14            | 3.864            | 1.198          |  |  |  |  |  |
| 19                                        | 14            | 4.135            | 1.310          |  |  |  |  |  |
| 21                                        | 14            | 4.547            | 1.374          |  |  |  |  |  |
| 23                                        | 14            | 4.559            | 1.402          |  |  |  |  |  |
| 25                                        | 14            | 4.738            | 1.413          |  |  |  |  |  |
| 27                                        | 14            | 4.693            | 1.407          |  |  |  |  |  |
| 29                                        | 14            | 4.575            | 1.365          |  |  |  |  |  |
| 31                                        | 14            | 4.325            | 1.316          |  |  |  |  |  |

stage, the efficiency of the sensor network decreases. The second phase of the QF variation starts from 11-stage up to 25-stage RO, where the QF value increases, and finally, it smoothly decreases for the RO lengths of 25–31. This trend can be explained so that by increasing the RO length, the thermal map error increases remarkably due to reduction of the sensor sensitivity; besides, the  $A_{OH}$  is increased noticeably too. However, because of choosing the proper sampling period, the procedure of  $P_{OH}$  and, especially,  $T_{OH}$  variations is not as high as two other metrics (i.e.  $A_{OH}$  and  $Tmap_{Error}$ ). Therefore, for the sensor networks comprising the 3-stage up to the 9-stage RO, the efficiency trend is descending. Then, from the RO length of 11 up to 25, the  $T_{OH}$  and the  $P_{OH}$  become dominant, results in ascending procedure of the efficiency variation. Next, from the 27-stage RO, the  $T_{OH}$  and the  $P_{OH}$  show little variations with partial slope due to low variation of the oscillation frequency of the RO, which makes the  $A_{OH}$ 

Table 13

Survey of relative-performance evaluation results of 12 RO-based temperature sensor designs in the network for constant RO length and counter width in terms of the area overhead (Eq. (13)), the thermal overhead (Eq. (14)), the power overhead (Eq. (16)) and the thermal map error (Eq. (20)) (normalized).

| Design       | A <sub>OH</sub> |                | T <sub>OH</sub> |                | P <sub>OH</sub> |                | Tmap <sub>Error</sub> |                |
|--------------|-----------------|----------------|-----------------|----------------|-----------------|----------------|-----------------------|----------------|
|              | RNS counter     | Binary counter | RNS counter     | Binary counter | RNS counter     | Binary counter | RNS counter           | Binary counter |
| XORCY        | 0.41            | 1.00           | 0.87            | 0.95           | 0.70            | 0.97           | 1.18                  | 1.18           |
| XORCY & LD   | 0.48            | 1.07           | 0.83            | 0.93           | 0.69            | 0.95           | 1.06                  | 1.06           |
| INV          | 0.41            | 1.00           | 0.95            | 1.00           | 0.74            | 1.00           | 1.00                  | 1.00           |
| INV & LD     | 0.48            | 1.07           | 0.91            | 0.98           | 0.72            | 0.99           | 0.90                  | 0.90           |
| CFGLUT5      | 0.38            | 0.98           | 1.05            | 1.09           | 0.80            | 1.07           | 0.80                  | 0.80           |
| CFGLUT5 & LD | 0.43            | 1.02           | 1.01            | 1.06           | 0.78            | 1.05           | 0.72                  | 0.72           |

Comparison of the efficiency of various sensor designs in the examined network for variant binary counter width (normalized).

| Quality factor (using Eq. (21))           |               |                  |                |  |  |  |  |
|-------------------------------------------|---------------|------------------|----------------|--|--|--|--|
| Design (for variant binary counter width) |               |                  |                |  |  |  |  |
| RO length                                 | Counter width | RNS ring counter | Binary counter |  |  |  |  |
| 3                                         | 14            | 5.775            | 1.760          |  |  |  |  |
| 5                                         | 14            | 4.133            | 1.349          |  |  |  |  |
| 7                                         | 14            | 3.109            | 1.000          |  |  |  |  |
| 9                                         | 14            | 3.050            | 0.980          |  |  |  |  |
| 11                                        | 14            | 3.237            | 1.058          |  |  |  |  |
| 13                                        | 13            | 3.432            | 1.196          |  |  |  |  |
| 15                                        | 13            | 3.649            | 1.232          |  |  |  |  |
| 17                                        | 13            | 3.864            | 1.303          |  |  |  |  |
| 19                                        | 13            | 4.135            | 1.426          |  |  |  |  |
| 21                                        | 13            | 4.547            | 1.501          |  |  |  |  |
| 23                                        | 13            | 4.559            | 1.533          |  |  |  |  |
| 25                                        | 13            | 4.738            | 1.544          |  |  |  |  |
| 27                                        | 13            | 4.693            | 1.539          |  |  |  |  |
| 29                                        | 13            | 4.575            | 1.493          |  |  |  |  |
| 31                                        | 13            | 4.325            | 1.436          |  |  |  |  |

dominant. Furthermore, since the longer the RO length is, the higher the  $A_{OH}$  becomes, as well as the  $Tmap_{Error}$ , the efficiency of the sensor network decreases and the trend of the QF variation remains descending till to the end.

Table 15 shows the *QF* value of the examined temperature sensor network for variant widths of the binary counters. As noted in Section 7.2.3, the 13-bit binary counter can be utilized instead of the 14-bit for the network of sensors composed of the 13-stage up to the 31-stage RO, but, as expected, this one bit reduction of the counter width does not results in notable improvement of the efficiency of the sensor network. According to the results presented in Tables 14 and 15, for 13- up to 31-stage RO, the efficiency of the sensor network constructed of 13-bit binary counters is an average 5.96% higher than the 14-bit binary counters. In summary, according to the experimental results obtained, the sensor network comprising the proposed sensor, which is composed of the 3-stage RO (i.e. CFGLUT5 & LD) and 3-moduli set RNS ring counter, has the best efficiency among other alternative designs, i.e., various sensor's configurations, variant RO length, and variant counter width.

#### 8. Conclusions and future work

In order to measure the temperature distribution, sense local temperatures, and monitor the thermal behavior of the chip at runtime and then apply DTM techniques effectively for run-time prophylactic proceedings of negative effects of rising on-chip temperature, constructing an efficient, precise, and reliable temperature sensor network plays a vital role, which is realized with RO-based temperature sensors on FPGAs. There are various designs of the RO-based temperature sensor in literature, which affect the sensor network's efficiency, result in different characteristics. In this paper, a new notion of the network of RO-based temperature sensors has been presented. Four useful criteria (i.e. are, thermal, and power overheads, and thermal map error) and some measurement methods have been introduced in order to evaluate and compare the relative performance of various sensor networks. Then, on the basis of the trade-offs between these criteria, the QF metric has been proposed for characterizing the efficiency of each design, which can help the designers to choose an optimal design based on the QF values obtained. Moreover, a novel structure of the RO-based temperature sensor has been proposed in this work that occupies 37.5% fewer resources compared to the most compact design and provides 2.72 times more sensitivity than the best sensitive sensor design. Based on the results from the experiments,

regardless of the RO configuration, the efficiency of the sensor network comprising of the RNS ring counter is an average 3.49 times better than the binary counter. Furthermore, in this work, different designs of the sensor in the network for variant RO lengths and also counter widths have been studied and explored in order to find the best design among others. Based on the results obtained experimentally, the sensor network composed of the proposed sensor, i.e. CFGLUT5 & LD and RNS ring counter, has the best efficiency among other alternative designs.

In the future work, we intend to implement various sensor networks on new FPGA devices and analyze results in order to study the influences of the state-of-the-art technologies on the efficiency of the RO-based temperature sensor networks.

# References

- M. Pedram, S. Nazarian, Thermal modeling, analysis, and management in vlsi circuits: principles and methods, Proc. IEEE 94 (2006) 1487–1501. http:// dx.doi.org/10.1109/JPROC.2006.879797.
- [2] S. Xie, W.T. Ng, An all-digital self-calibrated delay-line based temperature sensor for VLSI thermal sensing and management, Integr. VLSI J. 51 (2015) 107–117. http://dx.doi.org/10.1016/j.vlsi.2015.07.008.
- [3] I. Xilinx, Virtex-5 FPGA system monitor user guide, UG192 v1.7.1 Ed., 2011.
- [4] Altera Inc, Stratix IV Device Handbook, Volume 4, Device Datasheet and Addendum, 2014. Available online: https://www.altera.com/literature/hb/stratixiv/stx4\_5v4.pdf
- [5] S. Lopez-Buedo, J. Garrido, E. Boemo, Dynamically inserting, operating, and eliminating thermal sensors of FPGA-based systems, IEEE Trans. Compon. Packag. Technol. 25 (2002) 561–566. http://dx.doi.org/10.1109/TCAPT.2002.808011.
- [6] M.Happe, A.Agne, C.Plessl, Measuring and predicting temperature distributions on FPGAs at run-time, in: Proceedings of the International Conference on Reconfigurable Computing and FPGAs (ReConFig), IEEE, 2011, pp. 55–60 (http:// dx.doi.org/10.1109/ReConFig.2011.59).
- [7] K.M. Zick, J.P. Hayes, Low-cost sensing with ring oscillator arrays for healthier reconfigurable systems, ACM Trans. Reconfig. Technol. Syst. 5 (2012) 1–26. http:// dx.doi.org/10.1145/2133352.2133353.
- [8] P. Weber, M. Zagrabski, B. Wojciechowski, M. Nikodem, K. Kępa, K.S. Berezowski, Calibration of RO-based temperature sensors for a toolset for measuring thermal behavior of FPGA devices, Microelectron. J. 45 (2014) 1753–1763. http:// dx.doi.org/10.1016/j.mejo.2014.06.004.
- [9] Y. Yue, F. Shi-Wei, G. Chun-Sheng, Y. Xin, F. Rui-Rui, All-digital thermal distribution measurement on field programmable gate array using ring oscillators, Microelectron. Reliab. 55 (2015) 396–401. http://dx.doi.org/10.1016/j.microrel.2014.10.010.
- [10] S.Lopez-Buedo, E.Boemo, Making visible the thermal behaviour of embedded microprocessors on FPGAs: a progress report, in: Proceedings of the ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA), ACM, 2004, pp. 79–86. (http://dx.doi.org/10.1145/968280.968293).
- [11] B. Lee, K.S. Chung, B. Koo, N.W. Eum, T. Kim, Thermal sensor allocation and placement for reconfigurable systems, ACM Trans. Des. Autom. Electron. Syst. 14 (2009) 1–23. http://dx.doi.org/10.1145/1562514.1562518.
- [12] J.J.L.Franco, E.Boemo, E.Castillo, L.Parrilla, Ring oscillators as thermal sensors in FPGAs: experiments in low voltage, in: Proceedings of the Southern Conference on Programmable Logic (SPL), IEEE, 2010, pp. 133–137. (http://dx.doi.org/10.1109/ SPL.2010.5483027).
- [13] C.Ruething, A.Agne, M.Happe, C.Plessl, Exploration of ring oscillator design space for temperature measurements on FPGAs, in: Proceedings of the International Conference on Field Programmable Logic and Applications (FPL), IEEE, 2012, pp. 559–562. (http://dx.doi.org/10.1109/FPL.2012.6339370).
- [14] N.Rahmanikia, A.Amiri, H.Noori, F.Mehdipour, Exploring efficiency of ring oscillator-based temperature sensor networks on FPGAs (abstract only), in: Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA), ACM, 2015, p. 264 (http://dx.doi.org/10. 1145/2684746.2689104).
- [15] S. Magadum, K.V. Naveen, H. Guhilot, Sensor less distributed temperature sensor and control using FPGA, Int. J. Eng. Innov. Technol. 1 (2012) 295–298.
- [16] G.M. Quenot, N. Paris, B. Zavidovique, A temperature and voltage measurement cell for VLSI circuits, in: Euro ASIC, IEEE, 1991, pp. 334–338. (doi:10.1109/ EUASIC.1991.212842).
- [17] T.A. Demassa, Z. Ciccone, Digital Integrated Circuits, Wiley, New York, 1996.
- [18] I.M. Filanovsky, A. Allam, Mutual compensation of mobility and threshold voltage temperature effects with applications in CMOS circuits, IEEE Trans. Circuits Syst. I: Fundam. Theory Appl. 48 (2001) 876–884. http://dx.doi.org/10.1109/ 81.933328.
- [19] A. Ajami, K. Banerjee, M. Pedram, Modeling and analysis of nonuniform substrate temperature effects on global ULSI interconnects, IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 24 (2005) 849–861. http://dx.doi.org/10.1109/ TCAD.2005.847944.
- [20] C.Tradowsky, E.Cordero, T.Deuser, M.Hubner, J.Becker, Determination of on-chip temperature gradients on reconfigurable hardware, in: Proceedings of the International Conference on Reconfigurable Computing and FPGAs (ReConFig),

- [21] Altera Inc., Altera temperature sensor IP core user guide, UG-01074, 2015.
   [22] P. Chen, M.C. Shie, Z.Y. Zheng, Z.F. Zheng, C.Y. Chu, A fully digital time-domain smart temperature sensor realized with 140 FPGA logic elements, IEEE Trans. Circuits Syst. I: Regul. Pap. 54 (2007) 2661–2668. http://dx.doi.org/10.1109/TCSI.2007.906073.
- [23] S.Xie, W.T.Ng, Delay-line temperature sensors and VLSI thermal management demonstrated on a 60 nm FPGA, in: Proceedings of the International Symposium on Circuits and Systems (ISCAS), IEEE, 2014, pp. 2571–2574. (http://dx.doi.org/ 10.1109/ISCAS.2014.6865698).
- [24] A. Agne, H. Hangmann, M. Happe, M. Platzner, C. Plessl, Seven recipes for setting your FPGA on fire – a cookbook on heat generators, Microprocess. Microsyst. 38 (2014) 911–919. http://dx.doi.org/10.1016/j.micpro.2013.12.001.
- [25] S.Velusamy, W.Huang, J.Lach, M.Stan, K.Skadron, Monitoring temperature in FPGA based SoCs, in: Proceedings of the International Conference on Computer Design: VLSI in Computers and Processors (ICCD), IEEE, 2005, pp. 634–637. (http://dx.doi.org/10.1109/ICCD.2005.78).
- [26] Xilinx Inc., Virtex-5 libraries guide for HDL designs, UG621 v14.5 Ed., 2013.
  [27] W. Kester, Practical Design Techniques for Sensor Signal Conditioning, Prentice Hall, Analog devices Inc., 1999.
- [28] I. Xilinx, Constraints guide, UG625 v. 14.5 Ed., 2013.
- [29] M.A.Sayed, P.H.Jones, Characterizing non-ideal impacts of reconfigurable hardware workloads on ring oscillator-based thermometers, in: Proceedings of the International Conference on Reconfigurable Computing and FPGAs (ReConFig), IEEE, 2011, pp. 92–98. (http://dx.doi.org/10.1109/ReConFig.2011.18).
- [30] C. Ding, D. Pei, A. Salomaa, Chinese Remainder Theorem: Applications In Computing, Coding, Cryptography, World Scientific Publishing Co., Inc.,, River Edge, NJ, USA, 1996.
- [31] S.Bhoj, D.Bhatia, A dynamic temperature control simulation system for FPGAs, in: Proceedings of the International International Conference on Field Programmable

Logic and Applications (FPL), IEEE, 2008, pp. 659–662. (http://dx.doi.org/10. 1109/FPL.2008.4630033).

- [32] K. Skadron, M.R. Stan, K. Sankaranarayanan, W. Huang, S. Velusamy, D. Tarjan, Temperature-aware microarchitecture: modeling and implementation, ACM Trans. Archit. Code Optim. 1 (2004) 94–125. http://dx.doi.org/10.1145/980152.980157.
- [33] R.Mukherjee, S.Mondal, S.O.Memik, Thermal sensor allocation and placement for reconfigurable systems, in: Proceedings of the International Conference on Computer-Aided Design (ICCAD), IEEE, 2006, pp. 437–442. (http://dx.doi.org/ 10.1109/ICCAD.2006.320153).
- [34] S.Mondal, R.Mukherjee, S.O.Memik, Fine-grain thermal profiling and sensor insertion for FPGAs, in: Proceedings of the International Symposium on Circuits and Systems (ISCAS), IEEE, 2006, pp. 4387–4390. (http://dx.doi.org/10.1109/ ISCAS.2006.1693601).
- [35] P.H.Jones, J.Moscola, Y.H.Cho, J.W.Lockwood, Adaptive thermoregulation for applications on reconfigurable devices, in: Proceedings of the International Conference on Field Programmable Logic and Applications (FPL), IEEE, 2007, pp. 246–253. (http://dx.doi.org/10.1109/FPL.2007.4380655).
- [36] E. Boemo, S. Lopez-Buedo, Thermal monitoring on FPGAs using ring-oscillators, in: Proceedings of the International Workshop on Field Programmable Logic and Applications (FPL), IEEE, 1997, pp. 69–78.
- [37] S.Lopez-Buedo, E.Boemo, A method for temperature measurement on reconfigurable systems, in: Proceedings of the Design of Circuit and Integrated Systems Conference, Citeseer, 1997, pp. 727–730.
- [38] D.Sheldon, R.Roosta, M.Sadigursky, A.Farrokhy, Monitoring temperature in SRAM-based FPGAs using a ring-oscillator design, in: Proceedings of the Military and Aerospace FPGA and Applications Meeting, 2007, pp. 1–12.
- [39] P.Mangalagiri, S.Bae, R.Krishnan, Y.Xie, V.Narayanan, Thermal-aware reliability analysis for platform FPGAs, in: Proceedings of the International Conference on Computer-Aided Design (ICCAD), IEEE, 2008, pp. 722–727. (http://dx.doi.org/ 10.1109/ICCAD.2008.4681656).