Secure JTAG Implementation Using Schnorr Protocol

Amitabh Das · Jean Da Rol t · Santosh Ghosh · Stefaan Seys · Sophie Dupuis · Giorgio Di Natale · Marie-Lise Flottes · Bruno Rouzeyre · Ingrid Verbauwhede

Received: 30 May 2012 / Accepted: 8 March 2013
© Springer Science+Business Media New York 2013

Abstract The standard IEEE 1149.1 (Test Access Port and Boundary-Scan Architecture, also known as JTAG port) provides a useful interface for embedded systems development, debug, and test. In an 1149.1-compatible integrated circuit, the JTAG port allows the circuit to be easily accessed from the external world, and even to control and observe the internal scan chains of the circuit. However, the JTAG port can be also exploited by attackers to mount several cryptographic attacks. In this paper we propose a novel architecture that implements a secure JTAG interface. Our JTAG scheme allows for mutual authentication between the device and the tester. In contrast to previous work, our scheme uses provably secure asymmetric-key based authentication and verification protocols. The complete scheme is implemented in hardware and integrated with the standard JTAG interface. Detailed area and timing results are also presented.

Keywords JTAG · Secure testing · IP protection · Secure code and firmware updates · Cryptographic circuits · Schnorr protocol · Elliptic curve cryptography · Mutual authentication

1 Introduction

Joint Test Action Group (JTAG) is the common name for what was later standardized as the IEEE 1149.1 Standard Test Access Port and Boundary-Scan Architecture [15]. JTAG has remained as the ubiquitous test and debug interface standard for circuits and printed circuit boards in the semiconductor industry for more than two decades. The companion standard, IEEE Standard 1532 (Boundary-Scan-Based In-System Configuration of Programmable Devices) has extended JTAG to support on-board programming [14]. A current IEEE standard proposal (P1687, also known as Internal JTAG) seeks to further enhance JTAG by allowing block transfer of data and special instruction sets in order to speed up In-System Programmability.

JTAG was initially designed without a concern for security. As the capability of hardware attackers is increasing, more and more side-channels are discovered, which can compromise the security of a device. One such important side-channel is the improper use of the JTAG port. There have been many practical attacks on secure devices such as set-top box (STB) decoders using the JTAG interface [21]. ARM11 (Cortex) microcontroller, which is used in latest smartphones, has extensive test and debug facilities through the JTAG port. This is a well-known backdoor that is
currently used for instance to jailbreak iPhones/iPad, or to unlock protected services in mobile phones [10]. Even if not documented, it is reasonable to think that JTAG could be used to compromise the security of other applications such as mobile e-payments, or Wireless Sensor Nodes (WSNs) [3, 13].

Another security flaw due to JTAG is related to FPGAs. The configuration bitstream which contains the Intellectual Property (IP) information of a reconfigurable design is mostly programmed via the JTAG interface into FPGAs [31]. The firmware update of set-top boxes used in pay-TV subscriptions also happens in most cases through the JTAG port. An insecure JTAG access would allow on one side to re-program parts of the system at the hacker’s will, and on the other side it could be used to sniff configuration bits thus allowing retrieving the IP information.

Though there are several approaches for securing the JTAG interface, which can be found in the literature [6, 24–26, 28], most of them are based on symmetric-key approaches. They have an inherent key management problem. This is what we intend to overcome through the use of Public-key Cryptography (PKC) in our secure JTAG scheme. Though there is previous work on a protected JTAG scheme using ECC-based authentication protocol [5], the scheme uses PKC in a non-standard way causing key-management problems. Moreover, the paper also does not present any timing or area results. Though several PKC protocols can be used for establishing a secure authentication for JTAG, we use the ECC-based Schnorr protocol which is an efficient and provably secure protocol [30].

In this paper, we seek to provide security features to the IEEE 1149.1 JTAG interface by including a Schnorr-based secure test protocol, and present an efficient hardware implementation of the protocol using elliptic curve cryptography. Moreover, our approach does not make any modifications to the existing JTAG interface. To the best of our knowledge, this is the first paper that proposes a mechanism for mutual authentication between the secure device and the tester based on a well known and studied public key authentication protocol. Earlier work is either based on symmetric key systems or only proposes one way authentication, limiting the scenarios in which these systems can be used. The area requirement to incorporate this secure test infrastructure on the JTAG has been optimized to increase the scope of our proposed scheme in a wide range of application scenarios.

The rest of the paper is structured as follows. In Section 2, we present the past work that has been done in the area of secure JTAG implementation. A comparison of these approaches with our scheme is also made. The motivation for our work is given in Section 3. Section 4 presents the attacker model for the JTAG mechanism, and the idea for the Schnorr-based authentication protocol that is the basis of our secure JTAG strategy. This section also includes a discussion on the public key authenticity. Section 5 presents the Secure JTAG implementation. Two designs are presented using affine and projective coordinates. The area and timing results are shown in Section 6 and we conclude the paper in Section 7.

2 Previous Work

An ordinary JTAG standard [15] consists of a pre-defined interface, containing a serial input called TDI, a serial output called TDO, an input for the clock TCK and a mode select input called TMS. By controlling the TMS signal, the user can travel between the 16 states of the JTAG finite state machine, shown in Appendix F. Then, the request for executing the instructions and the transference of data between the circuit and the host is performed by connecting the input TDI and the output TDO to internal shift registers. Thus a malicious host can manipulate the JTAG inputs and execute any instruction.

One of the first approaches for implementing a secure JTAG appears in [24]. It presents a locking/unlocking mechanism for controlling the access to the JTAG instructions. It is based on storing a secret key inside the chip boundaries. To gain access to the JTAG features the user must shift in the secret key, otherwise the JTAG bypasses all the data on the TDI input to the TDO output. The scope of this approach does not consider the case where a fake circuit requests updates which may compromise the intellectual property.

A detailed evaluation of the JTAG test standard, its security problems, attackers’ capabilities, possible attacks and countermeasures has been done in [28]. It presents a JTAG security protocol using a stream cipher (Trivium), hash function and a message authentication code. The authors suppose that the service server is trusted, performing one-way authentication. However, to protect the data from unauthorized servers, the data is encrypted.

An anti-tamper JTAG Test Access Port (TAP) is described in [6] that uses SHA-256 secure hash and a true random number generator (TRNG) to create a low gate overhead challenge/response based access system employing an on-chip internal JTAG P1687 instrument. It is mentioned by the authors that malicious designers could modify the designs in order to observe the secret key, implying a one-way authentication scenario.

A multi-level security access system for controlling access to individual scan cells for preventing malicious eavesdropping from being loaded into the JTAG controller is presented in [26]. This approach also supposes the design is trusted, and thus it is not feasible for fake circuits to obtain proprietary updates.

An elaborate three-party secure JTAG protocol using certificates involving SHA-1 hash algorithm, AES block cipher and several arithmetic operators is presented in...
The authors describe the possible attack cases, but the protocol is not proven to be secure.

There are also industrial solutions for providing security to the JTAG interface. The Secure JTAG Controller (SJC) which features in Freescale Semiconductors i.MX31 and i.MX31L Multimedia Applications Processors is one such example. Similarly there are tools available from various vendors such as Discreetix and Lauterbach TRACE32 PowerTools, which provides Secure JTAG Debug module giving OEMs a highly secure, authenticated way to debug SoC errors throughout a system’s lifetime. A detailed overview of the JTAG related fuses and security features in the AVR microcontroller can be found in [11]. Some use-cases and application scenarios involving JTAG security are presented in [27].

To the best of our knowledge, the work in [5] is the only JTAG security solution that is also based on asymmetric key cryptography. In contrast to our work, this solution only provides one-way authentication from the test server to the JTAG device. Moreover, this solution does not improve key management related to symmetric solutions, as it requires the test server to have secure access to a database that contains all the unique private keys related to each device. Because of the non-standard setup of the authentication protocol, every JTAG device has a unique private key that is stored in a database. This key is retrieved by the test server in order to authenticate itself to the device. In our solution, we employ a standard use of public/private keys in which the prover uses its own private key and a certificate issued by a CA to prove its authenticity and not a private key related to the verifier as in [5].

Most of the previous approaches [5, 6, 24, 26, 28] suppose a one-way authentication, where either the circuit or the server is considered trusted. In this paper, we propose a suitable solution in cases where neither the circuit nor the server is trusted. Additionally, most of the previous solutions are not provably secure as the Schnorr protocol used in this paper.

3 Motivation

JTAG is mainly used for manufacturing and in-the-field test-and-diagnosis of VLSI circuits and boards. It may be disabled in the chip die after initialization of a product. However, there are some applications where JTAG is kept enabled for code or firmware updates. Especially, in case of reconfigurable devices like set-top boxes, where even remote reprogramming may be done through JTAG port based on updates received from the service provider. For instance, the STi7101 low-cost HD TV set-top box decoder and the TI MSP430 used in some set-top boxes have the JTAG open for product support and service.

In this work, we solve the inherent key-management problem of existing Symmetric-Key Cryptography (SKC) based secure JTAG approaches using Public-Key Cryptography (PKC). Specifically, if SKC is used for securing JTAG, there will be a common master secret key for all products or a large secret-key database needs to be maintained at the tester/updater side, which are not good options for mass electronic products. PKC implementations are inherently more hardware expensive and slower than SKC based approaches. Therefore it is a challenging task to incorporate PKC in a resource-constrained environment like JTAG.

The use of asymmetric primitives and the related public/private key pairs substantially improve the complexity involved in key management in this setting of tester against the device. If we take for example the automobile industry, then we expect to bring our car to virtually any garage in the world and get our car serviced. Servicing cars now also includes updating software in one of the on-board units (OBUs) which may be through the JTAG interface. Currently these updates can be pushed to the OBU as soon as it is powered on; no other security measures are used. One of most important reasons for the current lack of authentication is the fact that it presents car manufacturers with a large key management problem that is inherent to the use of symmetric solutions in large scale systems. In symmetric solutions, the verifier needs a copy of the same key that was also used to generate the authentication token (e.g., a message authentication code or MAC on the firmware). This implies that the use of a single master key is very risky as it will be spread in many devices and likely to leak at one point in time (see Section 4 for more details on who is the verifier in our approach). Therefore, symmetric key based solutions require unique keys to be installed at every verifier. In large scale systems, this would require a large database that link the identity of the prover to its key and a means for verifiers to securely access and authenticate this service. Alternatively, key derivation schemes could be used, but they only lower the risk related to a single master key. There are many other application scenarios where similar key-management problems can occur.

To overcome this problem, the solution proposed in this paper offers the possibility of using certificates instead of shared symmetric keys. This would, for example, allow the use of the same signed firmware update for a wide range of OBUs, without the risk of installing the same symmetric key in this range of devices. They just need a valid copy of the manufacturers’ public key for signature verification.

4 Proposed Secure JTAG Scheme

4.1 Attacker Model

We have considered the following application scenarios for our attacker model. The JTAG interface of a VLSI circuit is
normally used for testing the device, as well as for updating
the internal code and firmware in some applications. We
assume that the external JTAG interface of the target device
is enabled and is accessible to the attacker. In [28], the
attacker models are described based on malicious IP cores
inside a SoC. However, in this paper, we have considered
the following two attacker scenarios where internal IP cores
are assumed to be trusted, and the attacker is an external
entity to the cryptographic SoC. We assume that it is
impossible to extract the stored private key (present in secure
memory) on the device containing the JTAG interface.

Manufacturing Test/Firmware or Code Update at Manufactur-
er’s End We have considered the manufacturing environ-
ment to be controlled and the manufacturer’s test server to be
trusted. The device may be a fake one (or a clone) trying to get
authorized code or firmware updates through the JTAG
interface. The test server should allow only genuine devices
to have access to the updates. Hence, in this scenario, a one-way
entity authentication of the device to the Test Server is required.

The device needs to prove to the Test Server that it is in
possession of the correct private key, without revealing it to the
server. This is achieved through the use of the Schnorr protocol
to be employed in our secure test scheme. Here the prover is the
device with secure JTAG, while the verifier is the Test Server.
This is represented symbolically by the block diagram below:

A possible use-case for this scenario is system integration,
where the integrator procures VLSI chips from different
third-party vendors. He needs to make sure that each
chip is a genuine one, and not a fake one or clone which can
compromise the integrity of the complete system. This can
be achieved through the addition of a security feature to the
JTAG using an authentication mechanism.

In-the-field Update, Debug and Test When devices are
deployed in-the-field, the environment is considered uncon-
trolled and both the Test Server and the device with the
JTAG may be potential attackers. Hence, mutual entity
authentication is required between the device and the Test
Server. The Test Server might be a hacker or a malicious
user trying to extract the internal secrets from the device
through the test infrastructure. Similarly, the device may be
malicious or even a fake one, trying to procure unauthorized
code or firmware updates through the JTAG interface.

Hence, both the device and the Test Server need to prove
their identity to each other without revealing their secrets
(their private keys). For the mutual authentication using
ECC based Schnorr protocol, when the Secure JTAG is the
prover, the Test Server is the verifier. Similarly when the
Test Server is the prover, the Secure JTAG is the verifier.

This is represented graphically by the following block
diagram:

A possible application of this attacker model is the firm-
ware update of set-top boxes used in pay-TV subscriptions.
The user of the set-top box might be a possible attacker
trying to get an unauthorized update from the server using
the JTAG port to watch pay channels for free. Similarly, an
unauthorized update from a remote hacker using the JTAG
port might compromise the secret keys stored in the smart
card of the set-top box.

The Schnorr protocol itself is proven secure in a very
strong attacker model [30]. This means that no information
about the private key is leaked to the verifier or any of the
attackers that fit the attacker model in [30]. As a result, the
system can only be attacked by extracting the private key
through side-channel attacks (though our designs are pro-
tected against Simple Power Analysis), leaks during instal-
lation/generation of the keys, attacks on the CA facilities,
etc. In this paper we present an efficient implementation of
the Schnorr protocol that makes its use cost effective for low
cost JTAG devices. Side channel attacks on this implementa-
tion or attacks related to software bugs, etc. or not in scope
of this paper. We claim that our solution is secure in the two
scenarios described above, as it is a straightforward use of
the Schnorr protocol that is proven secure.

4.2 Secure Test Authentication Based on Schnorr Protocol

We use an enhanced version of ECC-based Schnorr Protocol
[30] as the public-key cryptographic protocol in our secure
JTAG test scheme. Various public-key implementations,
such as RSA or ECC, may be used to solve the key-
management problems present in previous secure JTAG
approaches. We chose ECC as it offers the same security
as RSA, with much smaller area footprint. Area overhead is
of critical importance, since we are constrained in terms of
silicon area required to incorporate security features into
JTAG, owing to the small test interface available in most
applications. Similarly, various protocols using ECC may
been used. We chose the Schnorr protocol as it is provably secure and allows efficient implementation on space-constrained hardware.

An added positive side-effect of Schnorr is that it is “zero-knowledge” and thus no information about the secret key of the prover leaks during a protocol run. The zero-knowledge property may be useful in an uncontrolled in-the-field code update, debug or test environment where the communication channel between the test server and JTAG is untrusted and the secret need not be shared or linked to a communicating entity. Moreover, Schnorr is a very established protocol, and is used in Radio Frequency Identification (RFID) protocols [1, 2]. The related ECC-based Schnorr authentication protocol [30] is described in appendix A.

4.3 Public Key Verification

When using public key cryptography for authentication purposes, it is essential to verify the authenticity of the prover’s public key. Traditionally, the link between a user’s public key and some identifier of the user is captured in a digital certificate that is signed by a trusted third party (e.g., certificate authority or CA). By verifying this certificate, a verifier is assured that the public key that is provided by the prover is genuine. This means that it is sufficient to have a copy of the CA’s public key in order to verify all public keys that are certified by the CA. It is clear that storing a single CA’s public key is far more practical than storing a collection of symmetric keys that are shared with each possible prover. Therefore, we argue that our protocol, although more resource consuming, does provide a more practical solution when compared with previous JTAG authentication mechanisms that are based on symmetric cryptography only.

We propose two modes of operation, one is purely offline and the other uses an online connection to a trusted Authentication Server (AS).

In the offline mode, we assume that every prover has a certified public key and this certificate is signed by a trusted CA. Every verifier has a copy of the CA’s public key. Before the actual Schnorr authentication protocol, the prover sends his certificate to the verifier. The verifier simply uses the CA’s public key to verify the certificate. In case, the verifier has access to a “clock”, the verifier can also check an optional expiration date inside the certificate. In case the JTAG device is the verifier, this clock will probably not be available and no expiration date can be verified. Note that in this scenario, it is not possible to revoke certificates, as it is not possible to use an online server to obtain revocation lists or use an Online Certificate Status Protocol (OCSP) like protocol.

In case the verifier has the possibility to contact the online trusted AS, we propose to use a simplified version of the OCSP protocol. The protocol steps are depicted in the next figure:

![Diagram](image)

\[
\text{Sig}_{\text{AS}}(\text{PK}_p \oplus \text{C} \oplus \text{ID}_p)
\]

\[
\text{Prover} \quad \text{ID}_p \rightarrow \text{Verifier}
\]

\[
\text{PK}_p \rightarrow \text{Check Sig on PK}_p \text{ using public-key of AS}
\]

\[\text{Sig}_{\text{AS}}\text{ denotes signature of the prover’s public key with the private key of the Authentication Server, ID}_p \text{ and PK}_p \text{ are the Identity and Public-key of the Prover respectively.}\]

The prover starts with sending its ID$_p$ and its public key PK$_p$ to the verifier. The verifier then initiates a call to the AS by sending a fresh random challenge C and the ID of the prover to the AS. The AS will now lookup the public key of the prover, check whether it is still valid, and if so send a signature on the XOR of PK$_p$ C and ID$_p$ back to the verifier. The verifier will only accept the public key received from the prover upon reception of a valid signature by the CA on the generated challenge PK$_p$ C1 ID$_p$. We are Xoring the ID, challenge and the public key (instead of appending) in order to make sure that we can sign this value without first using a cryptographic hash function. In case the messages we wish to sign becomes longer than the field length of the ECC module we use, we would first have to reduce this length by employing a cryptographic hash function (and potentially cropping the result). As the implementation of such a hash function would consume too much area, we have designed our protocol to operate without a hash function.

The signature scheme can be implemented using Elliptic Curve Digital Signature Algorithm (ECDSA). This consumes less area overhead than a 1024-bit RSA signature scheme. An area-efficient implementation is presented in [18]. In our implementation, we have modified the ECC Schnorr controller to allow ECDSA. The hashing involved in the ECDSA signature verification is avoided as we use 192-bit signatures (the same length as the message that is signed, which is the public-key of the prover). Through this public key certificate we protect the Schnorr protocol from man-in-the-middle attack too. Devices which have adequate resources (online connection to the authentication server) to support this authentication process can opt for an online mode, while other devices can have an offline mode of authentication.

In this paper, we provide two different implementations. One is over projective coordinates and another is over affine coordinates. In the first design, we do not implement an inversion module whereas it is included in the second design. Due to projective coordinates the first design involves very few inversions which are performed iteratively on a multiplier unit.
following Fermat’s little theorem. However, in affine coordinates inversion is performed at every iteration of the point multiplication algorithm. Thus, a dedicated inversion unit based on extended Euclidean algorithm is implemented which also helps efficient execution of ECDSA on our secure JTAG scheme. Here we provide implementation details for both designs which provide better design variations and the user can opt for one of them in practice.

5 Secure JTAG Implementation

5.1 Integration of the ECC-processor with JTAG

An important contribution in our paper is the integration of the ECC based Schnorr controller and ECC point multiplier with the JTAG interface along with the other modules. This has been done in a seamless manner so as not to affect the timing aspects of the IEEE 1149.1 JTAG standard, and also keeping the behavior of the TAP finite state machine (illustrated in Appendix F) unchanged.

Our proposed architecture is shown in Fig. 1. The ordinary JTAG circuitry is enclosed within dotted lines, and it is divided into its two main components: the TAP finite state machine and the instruction decoder. The Schnorr protocol (described in Appendix A), as well as the ECDSA signature authentication are performed by the Schnorr controller, placed in the center of Fig. 1. It interacts with a modified JTAG instruction decoder, ECC module, and a 192-bit random number generator (a Linear Feedback Shift Register). The base point coordinates (curve parameters) are fetched from an external non-volatile memory.

The system is supposed to be locked in the beginning. In order to unlock it, the tester must manipulate the JTAG inputs to enter the new ‘UNLOCK’ instruction. Then, the instruction decoder informs the Schnorr controller to start the protocol, by means of the ‘request unlock’ signal. As soon as the authenticity of the test server is verified, the Schnorr controller activates the ‘release unlock’ signal, informing the instruction decoder that other instructions can now be performed. For instance, if the system is unlocked, the design under test (DUT) boundary scan register can be controlled. Meanwhile, when ‘release unlock’ signal is not active, the instruction decoder sets the multiplexer ‘MUX1’ to always select the output from the multiplexer ‘MUX2’, which is controlled by the Schnorr controller, impeding the shift out of any DUT specific register.

During the protocol execution, the communication with test server consists of using the Schnorr shift registers (192 bits) to shift in and out information required for the protocol. For instance, the transmission of the intermediate values, ‘T1’ and ‘T2’ (Protocol in Appendix A) is performed by means of shifting out the value ‘T1’ (or ‘T2’) once the ECC point multiplication is finished. It is important to notice that the shifting is always controlled by the test server, and that the timing for executing point multiplications depends on the scalar multiplier. It means that the Schnorr controller must inform the test server that it has finished each operation of the protocol. This synchronization is achieved by always adding one flip-flop at the end of the Schnorr shift register that is set to ‘0’ if the information in the shift register is valid, otherwise the multiplexer ‘MUX2’ selects the TDI input and the synchronization flip-flop is set to ‘0’. Thus, the test server keeps on shifting at least this one bit to detect that the Schnorr
controller is ready for receiving the next data. The step-wise detailed ECC-based Schnorr protocol for Secure JTAG is described in Appendix B.

5.2 Implementation of the ECC processor

The exponentiations involved in the Schnorr protocol may be implemented using RSA or ECC. However, ECC involves much smaller bit lengths compared to RSA and is efficient in hardware. Hence we implement the Schnorr protocol using 192-bit ECC over prime fields which offers higher security compared with 1024-bit RSA. Highly efficient ECC and ECDSA implementations for constrained environments can be found in [28, 29]. However, in this work, we present two new designs which are optimized both for area and timing suited for integration with the standard JTAG.

We use the 192-bit NIST ECC curve P192 and work in prime fields (F_p). The curve parameters used in our ECC implementation is as follows [12]:

- p: The order of the prime field F_p.
- a, b: The coefficients of the elliptic curve y^2 = x^3 + ax + b.
- n: The (prime) order of the base point P.
- h: The cofactor.
- x, y: The x and y coordinates of P.

\[ P-192: \quad p = 2^{192} \cdot 2^{64} - 1, \quad a = -3, \quad h = 1 \]
\[ b = 0 \times 64210519 \quad E59C80E7 \quad 0FA7E9AB \quad 72243049 \quad FEB8DEEC \quad C146B9B1 \]
\[ n = 0 \times FFFFFFFF \quad FFFFFFFF \quad FFFFFFFF \quad 99DE836 \quad 146B9B1 \quad B4D22831 \]
\[ x = 0 \times 188DA80E \quad B03090F6 \quad 7CBF20EB \quad 43A18800 \quad F4FF0AFD \quad 82FF1012 \]
\[ y = 0 \times 07192B95 \quad FFC8DA78 \quad 631011ED \quad 6B24CDD5 \quad 73F977A1 \quad 1E794811 \]

The point operations over affine and projective coordinates are performed by standard formula taken from the literature, and are provided in Appendix C. In the projective coordinate, we represent a point as: (x = X/Z, y = Y/Z, c = 1, d = 1). In general, projective coordinates are introduced to avoid the relatively more costly inversion used in point-operation over affine coordinates. However, relative among the costs of multiplication and inversion in F_p varies on their implementations. For example, when modular multiplication is computed in a bit serial fashion, it leads to log_p number of iterations (clock cycles); whereas, when inversion is performed by binary Euclidean algorithm, it requires at most 2 log_p number of iterations (clock cycles). Following the above design technique, the point operations over affine coordinates outperform projective coordinates.

5.2.1 Design I: ECC over Projective Coordinates

In this implementation, we use the 1998 Cohen–Miyaji–Ono mixed coordinates for point addition [7] and the 2007 Bernstein–Lange formulae [4] for point doubling from Explicit Formula database for Short Weierstrass curves [8]. The equations are mentioned in Appendix C. To reduce area overhead, the adder and the Montgomery multiplier used in ECC have been optimized. The ordinary adder/subtractor (required for the intermediate operations of the Montgomery Multiplier) has been combined with the modular adder/subtractor (required for the addition/subtraction operations used to implement ECC in projective coordinates) using a 2-bit select signal. This helps reduce the area overhead further. The modules employed in our design are described below.

**Schnorr & ECDSA Controller Modules** Figure 2 shows the block diagram of the Design I implementation.

The Schnorr protocol consists of two main operations: ECC point multiplication and modular multiplication. In order to perform an ECC point multiplication, the Schnorr controller loads correctly the inputs of the ECC & ECDSA controller, and sets the “mode” signal to ‘0’. Then it disables the “reset” signal of the ECC & ECDSA controller so it can initiate the execution. Then the Montgomery multiplier and the modular/ordinary adder/subtractor blocks are used to perform the point multiplication. As it can be seen, the adder/subtractor block is shared between the ECC controller and the Montgomery multiplier. If the ECC controller is using it, it sets the “select” signal to ‘1’ and then it chooses the operation type by setting the “mod kind” signal (‘0’ for addition and ‘1’ for subtraction). Each ECC scalar multiplication is performed using the Montgomery Powering Ladder algorithm, which is also protected against Simple Power Analysis (SPA) attacks.

On the other hand, to perform a modular multiplication, we reuse the Montgomery multiplier. For that purpose we use the order of the prime number as modulus instead of the prime number itself. The Schnorr controller sets the mode to ‘1’ and then uses the ECC controller as an interface to the Montgomery multiplier block. This interfacing was implemented in order to reuse the ECC controller finite-state-machine.

The ECDSA operation is performed partially by the ECC & ECDSA controller and partially by the Schnorr controller. It first executes all the ECDSA steps which require only integer multiplications by setting the mode to ‘2’ and loading the signature into the ECDSA block. Then the Schnorr controller saves these intermediate values and reuses the ECC controller block to run the two final point multiplications. For executing the inversion present in the ECDSA protocol we used the Itoh-Tsujii algorithm [16] based on the Fermat’s little theorem that allows to execute inversions using a modular multiplier. The Pseudo-Random Number Generator (PRNG) in the diagram

 Springer

4/1/2013 1:12 AM
which generates the random numbers required in the Schnorr protocol is a 192-bit LFSR. We are reseeding the LFSR after every authentication execution, with a new seed to avoid starting it with the same initial value on power up, in order to prevent replay attacks. Efficient LFSR reseeding techniques [19, 23, 32] using seed storage methods or seed derivation from the modules of the design can be used for the purpose. For security, the LFSR length must be large enough to prevent brute-force attacks (192-bit in our design) and irreducible polynomials used for the feedback taps to have all possible sequences \(2^{192} - 1\), in our case. Moreover, the reseeding must be done quite often to prevent prediction of generated sequences (at the beginning of every authentication as in our case). The new seed value is loaded into the LFSR as soon as the 'request unlock' signal in Fig. 1 goes high. Alternatively, for enhanced security, True Random Number Generators (TRNGs) based on Fibonacci or Galois Ring Oscillators [17], which have similar area overhead as LFSRs and substantially high randomness and unpredictability properties, can also be employed.

Montgomery Multiplier Montgomery’s algorithm is the most common method for a fast implementation of modular multiplications.

Algorithm I in Appendix D presents an efficient implementation of this algorithm. As one can notice, the final comparison is optimized exploiting carry-save-adders (CSA). CSAs are used for the intermediate computations and then a full addition is performed to convert the final carry-save result into a conventional form, such as presented in the Algorithm 2, Appendix D, and Fig. 3.b. The CSA adders have indeed a small area and avoid carry propagation, i.e., are computed in constant time independently of the operands’ length.

However, as a modular adder/subtractor is needed for ECC, we have decided to use the initial algorithm. We have indeed modified the adder/subtractor block to have an ordinary addition/subtraction also, in order to use this block in our Montgomery multiplication implementation. Optimizing the area is indeed our main objective, so using existing resources is better than implementing CSA adders. As a consequence, our implementation takes more or less twice as many cycles, but it is the one that optimizes most of the area. In the end, we have managed to optimize the area by more than 16 %, in comparison with the RTL description of the original design with an unoptimized implementation of the adder/subtractor. This optimized arithmetic block is presented in Appendix E.

5.2.2 Design II: ECC Over Affine Coordinates

The execution of the Schnorr protocol and ECDSA consists of several finite field operations (including inversion) and operations on elliptic curves. In Weierstrass elliptic curve, a point is primarily defined over Affine \((x, y)\) coordinates which is further redefined over several Projective coordinates with the help of a third variable \((X, Y, Z)\) in order to avoid inversion in point operations performed by chord-and-tangent method. Explicit formulae are provided in Appendix C. A single inversion is eliminated by several \((4-12)\) multiplications in Projective coordinates — still research is going on for finding coordinate systems to lower down multiplications in a point operation.

However, the implementation technique also plays an important role for improving efficiency of elliptic curve operations under a resource constrained environment like JTAG. It is already described in Section 5.2 that delay of a binary inversion/division method is just twice that of a bit-serial multiplication where both of them are assumed to be implemented by simple adder circuits — demands area in the same decimal order (3 times). On the other hand, efficient implementation of modular (Montgomery) multiplication could be achieved through digit-serial (parallel) architecture which demands much more area (order of digit length) and may not be affordable in the application of secure JTAG implementation. Thus design II attempts to implement a compact and flexible architecture for executing Schnorr protocol and ECDSA over Affine coordinates. Besides, this design computes all modular operations directly on 2’s complement binary domain that avoids cost of domain conversions compared to the first design.

Flexible Datapath In order to reduce the complexity of the controller logic, design II consists of a flexible datapath having all arithmetic blocks. There are two top level controllers in the current secure JTAG implementation — namely ECDSA-controller and Schnorr-controller. These controllers generate instructions like **PointAdd**, **PointMult**, **FieldAdd**, **FieldMult**, **FieldInvert**, **FieldSquare**.
FieldMult, FieldInv, FieldSub. In the next lower level, there is an ECMULT-controller which primarily generates two instructions - PointAdd and PointDbl. All instructions generated by top level controllers are first checked by the ECMULT-controller which further passes through the next lower level. Except PointMult, all other instructions are directly executed by the datapath shown in Fig. 4. The instruction PointMult consists of PointAdd and PointDbl instructions which are generated in proper sequence by the ECMULT-controller. All controller logic in design II are realized as finite state-machines in which the final state sends a done signal to its predecessor. With the help of five temporary registers the datapath shown in Fig. 4 computes a point operation (point doubling or point addition) as a single instruction - in which case the output of an execution is stored and supplied back to the memory through x3 and y3 ports. On the other hand, the outputs for all other finite field operations are directly generated from individual arithmetic units. In order to execute PointAdd instruction the datapath takes the input data from 'x1', 'y1', 'x2', 'y2', 'p', and 'a' ports, whereas for executing individual finite field operation the ports are configured by the upper level controller logic.

Prime field Multiplication In this design we use Blakley multiplication which is based on the iterative execution of doubling and addition. All internal operations are performed in respective prime field, that is intermediate results are always in their reduced form. Hence, the costly final reductions are eliminated. The multiplier unit contained in the datapath block (Fig. 4) computes a multiplication \( ab \mod p \) in \( \log p \) number of clock cycles, assuming that both a and b also have lengths of \( \log p \).

Prime field Inversion and Division The prime field inversion and division could be efficiently computed by binary inversion division algorithm, which is based on binary Euclidian algorithm. The current design follows the implementation of such a unit that is described in [9]. The current module can compute one inversion (used on ECDSA) as well as one division (used to execute PointAdd and PointDbl) in 2 \( \log p \) number of clock cycles.

Design II executes a PointAdd instruction in 5 \( \log p \) + 6 clock cycles and PointDbl in 4 \( \log p \) + 8 clock cycles. The clock cycles required to execute a PointMult instruction is: \( \log k \* (4 \log p + 8) + (#k-1) \* (5 \log p + 6) \), where \( k \) represents the scalar multiplier in kP and #k indicates the Hamming weight of \( k \).

6 Results

We present here the area and timing results of our implementation. Both ASIC and FPGA results of the overall secure
Table 1: Hardware cost of secure JTAG

<table>
<thead>
<tr>
<th>Module</th>
<th>Design I</th>
<th>Design II</th>
<th>Design I</th>
<th>Design II</th>
</tr>
</thead>
<tbody>
<tr>
<td>Arithmetic unit (modular adder and subtractor)</td>
<td>1374</td>
<td>5128</td>
<td>164</td>
<td>311</td>
</tr>
<tr>
<td>Modular multiplier *</td>
<td>5152</td>
<td>7314</td>
<td>615</td>
<td>756</td>
</tr>
<tr>
<td>Inversion module</td>
<td>2431</td>
<td>1482</td>
<td></td>
<td></td>
</tr>
<tr>
<td>Controller and data multiplexers</td>
<td>40190</td>
<td>10295</td>
<td>2189</td>
<td>531</td>
</tr>
<tr>
<td><strong>Total</strong></td>
<td><strong>46716</strong></td>
<td><strong>47050</strong></td>
<td><strong>2968</strong></td>
<td><strong>3080</strong></td>
</tr>
</tbody>
</table>

*Montgomery multiplier for Design I, while Blakley multiplier for Design II

JTAG design with the sub-modules are mentioned. Though our design is larger than earlier methods, owing to the use of public-key cryptography (as opposed to symmetric-key usage in the other approaches), this helps solve the key-management problem inherent in other approaches to a great extent.

6.1 Area Overhead

The ASIC area requirements in terms of gate equivalents (GEs) (synthesized with Synopsys Design Compiler v2009. 06 for a Faraday 130 nm library) for the modules used in our ECC implementation are given in Table 1. The FPGA Synthesis results on Xilinx ISE 12.4 (with Virtex 6 xc6vlx75t family) for the modules are also presented.

Hence, as shown in the table, we require a total of 46716 GEs for design I and 47050 GEs for Design II to implement the secure JTAG Scheme with the Schnorr and ECDSA controllers. We choose the solution described in [25] for having an estimated area overhead of our approach. The cost of that solution is 25 k gates. It means that our solution is around twice larger than the solution in [25]. Our solution is based on public-key cryptography, which inherently demands more hardware resources than symmetric-key based approaches. The area calculations do not consider the overhead of the Hash function implementation which takes around 10 k gates [29] for SHA-1, in case the message lengths become longer than the ECC field size (as mentioned in Section 4.3). However, this is not applicable for properly designed protocols, as in our case. It must be also noted that all secure authentication schemes including the one proposed in this paper require Non-volatile memory for storing cryptographic keys.

The area requirement in our designs can be reduced further by making use of a tiny custom microcontroller with an Instruction Set Extension (ISE), as in [22]. Here only the top-level ECDSA commands are managed with a processor. Moreover, replacing the Montgomery Multiplier (suitable for general prime-field operations) with more efficient multipliers employing Mersenne-like NIST prime reduction suitable for prime fields over $F_p$ (with the prime number $p$ being a Pseudo

Table 2: Detailed timing estimates

<table>
<thead>
<tr>
<th>Scenario</th>
<th>Operation</th>
<th># Clock cycles</th>
<th>Clock class</th>
</tr>
</thead>
<tbody>
<tr>
<td>1. One way Authentication (DUT is the prover)</td>
<td>Unlocking</td>
<td>13</td>
<td>13 Test clock</td>
</tr>
<tr>
<td></td>
<td>Time to shift data in and out Protocol (1 k.P* operation)</td>
<td>768</td>
<td>768 Test clock</td>
</tr>
<tr>
<td></td>
<td></td>
<td>3068150</td>
<td>240762 Functional clock</td>
</tr>
<tr>
<td>2. One way Authentication (DUT is the verifier)</td>
<td>Unlocking</td>
<td>13</td>
<td>13 Test clock</td>
</tr>
<tr>
<td></td>
<td>Time to shift data in and out Protocol (2 k.P* operation)</td>
<td>768</td>
<td>768 Test clock</td>
</tr>
<tr>
<td></td>
<td></td>
<td>6136692</td>
<td>482130 Functional clock</td>
</tr>
<tr>
<td>3. Mutual(Two-way) Authentication</td>
<td>Unlocking</td>
<td>13</td>
<td>13 Test clock</td>
</tr>
<tr>
<td></td>
<td>Time to shift data in and out Protocol (3 k.P* operations each for prover and verifier)</td>
<td>960</td>
<td>960 Test clock</td>
</tr>
<tr>
<td></td>
<td></td>
<td>9204842</td>
<td>722892 Functional clock</td>
</tr>
<tr>
<td>4. ECDSA</td>
<td>Unlocking</td>
<td>13</td>
<td>13 Test clock</td>
</tr>
<tr>
<td></td>
<td>Time to shift data in and out Protocol (2 k.P* operation)</td>
<td>576</td>
<td>576 Test clock</td>
</tr>
<tr>
<td></td>
<td></td>
<td>6137075</td>
<td>482324 Functional clock</td>
</tr>
</tbody>
</table>

* 'k.P' indicates one Elliptic Curve Scalar Multiplication.
Mersenne number as used in NIST curves) can also help reduce the execution time for an Elliptic Curve scalar multiplication.

There are of course much more compact implementations available in the literature, for instance, the 192-bit ECDSA implementation in [22] employing the same NIST recommended curve as in our case requires only 19.1 KGEs (thus consuming 23.5% less than the approach in [25]) and 859,188 cycles in total for the combined operations of ECDSA, Hash and Random number generation required for the protocol execution. Similarly the most area efficient 163-bit ECC implementation in [20] consumes only 12.5 KGEs (thus taking half the area in [25]) and 275,816 cycles for one Elliptic Curve scalar multiplication. In this paper, though we did not achieve such high compactness, we have shown the feasibility of integration of the JTAG with the ECC and ECDSA modules by presenting combined area and timing results which have limited overheads.

6.2 Timing Overhead

The impact of the proposed solution in the use of the JTAG standard consists of an initial delay for executing the Schnorr protocol/ECDSA. Once the authentication and the signature verification steps are finished, the JTAG is unlocked and the JTAG instructions can be used without any timing overhead.

The initial delay is due to three main operations: 1) the time to request the unlock (associated with the time to insert the instruction ‘UNLOCK’) and the time to release the lock; 2) the time to shift in the protocol inputs and shift out the protocol outputs using the JTAG controller; and 3) the time to perform the protocol operations, including ECC point multiplications, ordinary multiplications and additional operations to communicate between the dedicated Schnorr protocol modules. The first two operation types are measured in test clock cycles that depend on the JTAG frequency, while the last operation type is measured in functional clock cycles, the functional clock being usually faster than the test clock. The timing overhead is presented in Table 2, where we have distinct four scenarios. The first scenario is a one-way authentication (manufacturer environment in Appendix A) in which the DUT acts as prover (A) and the test server acts as verifier (B). The only scalar (point) multiplication performed for A is \( n_a P \) for generating \( T_A \). The second scenario is a one-way authentication (manufacturer environment in Appendix A), but the roles of prover and verifier are reversed. Here the DUT acting as the verifier B performs two scalar multiplications \( s \cdot P \) and \( n_b P_B \). The third case is the two-way authentication (in the field update in Appendix A). Here, both A and B perform three scalar multiplications \( n_A P, s_1 P \) and \( n^*_b P_B \) for A, and \( n_b P, s_2 P \) and \( n^*_b P_B \) for B. Finally, the last one is the timing overhead associated with the execution of the ECDSA signature verification, which requires two scalar multiplications.

For having an estimation of time in milliseconds, we suppose a 100 MHz clock frequency for the JTAG test clock, and 115 MHz as functional clock frequency for Design I and 123 MHz for Design II, as shown in Table 3. The functional clock frequency is the maximum operating frequency obtained from FPGA synthesis. The test clock and the functional clock can be also the same without involving any design change. Considering the mutual authentication scenario with ECDSA signature verification, Design I has an initial delay of 133.42 ms while Design II has an initial delay of 9.83 ms.

7 Conclusion

In this paper, we have presented the implementation of a secure test scheme integrating the provably secure Schnorr protocol with JTAG-based testing. The key management problem inherent in previous symmetric-key based approaches is overcome through the use of public-key cryptography in our test scheme. Moreover, we present detailed hardware implementations, area and timing results for our ECC and ECDSA-based authentication protocol. To the best of our knowledge, this is the first complete work for securing the JTAG interface using public-key cryptography which also provides mutual authentication between the device and the tester.

Acknowledgement This work was supported in part by the Research Council KU Leuven: GOA TENSE (GOA/1/1005), by the Flemish iMinds projects, and by the European Commission through the ICT programme under contract ICT-2007-216676 ECRYPT II. In addition, this work is supported in part by the Flemish Government, FWO G.059.12N, by the Hercules Foundation AKUL/11/19, and by the European Commission through the ICT programme under FP7-ICT-2011-8 HINT. Amritabha Das was initially funded by the Erasmus Mundus External Cooperation Window Lot 15 (EMECW15) when part of the work was performed. Santosh Ghosh is a beneficiary of a mobility grant from the Belgian Federal Science Policy Office co-funded by the Marie Curie Actions from the European Commission. The authors are thankful to Dr. Junfeng Fan (ESAT/ESOC, KU Leuven) for his useful comments in improving the paper.

Appendix A

ECC-based Schnorr Protocol

In the manufacturer environment scenario, A is the prover (Secure JTAG) and B is the verifier (Test server):

\[ P_b \] is the public key of A and \( k_a \) is the private key of A, which are related by:

\[ P_a = k_a P \]

where P is the initial point on the Elliptic curve (base point), which is public, \( k_a P \) represents a point multiplication of scalar \( k_a \) with base-point P.

Goal: B wants to be ensured the identity of A, in other words A knows \( k_a \).
Protocol:

1) A generates a random number $n_a$ and sends an intermediate value $T_a$ (point multiplication of $n_a$ and $P$) to $B$

\[
\begin{align*}
A & : \quad T_a \quad n_a \quad P \\
B & : \quad n_b \\
\end{align*}
\]

2) $B$ generates a random number $n_b$ and sends it to $A$

\[
\begin{align*}
A & : \quad n_b \\
B & : \quad s \quad n_a \quad k_a \quad n_b, \quad n_a' \\
A & : \quad s_1 \quad n_b' \quad k_b \quad n_a
\end{align*}
\]

3) $A$ generates another random number $n_a'$ and sends it along with sends $'s'$ to $B$, $B$ sends $'s_1'$ to $A$

\[
\begin{align*}
A & : \quad B : \quad s \quad n_a \quad k_a \quad n_b, \quad n_a' \\
A & : \quad B : \quad s_1 \quad n_b' \quad k_b \quad n_a
\end{align*}
\]

$B$ can verify that $A$ is $A$ by calculating:

\[
\begin{align*}
s \quad P \quad T_a \quad n_b \quad P \\
\end{align*}
\]

Similarly, $A$ can verify that $B$ is $B$ by calculating:

\[
\begin{align*}
s_1 \quad P \quad T_b \quad n_a \quad P \\
\end{align*}
\]

Thus $B$ verifies the identity of $A$ by only knowing $A$'s public key $P_a$, and $A$ verifies the identity of $B$ by only knowing $B$'s public key $P_b$.

Moreover, $n_a.n_b.P$ can be used as a session key $K$ to encrypt all future communication between the security chip and test server. The reason behind this is that $A$ knows $n_a.P$ and $n_a'$, while $B$ knows $n_b$ and $n_b'.P$ from which they can construct $K$, but any unauthorized party cannot do so. This may be particularly useful for instance, in the case of pay-TV updates happening on the set-top box from a remote server using a network communication, where an eavesdropper can listen to the channel in between.

Appendix B

ECC based Schnorr for secure JTAG

The execution of the Schnorr protocol is now explained in some detail using the block diagram below:

1) First, the JTAG public key $P_a$ is calculated. For this, the ECC controller module sends the private JTAG key $k_a$ (from on-chip storage) and the base point coordinates and other curve parameters (prime number, $R \ast R \mod n$) from the non-volatile memory to the ECC point multiplier module. It then instructs the point multiplier module to start an ECC point multiplication operation.

2) The ECC point multiplier then performs a point multiplication of the scalar $k_a$ with the base point $P$ and returns the result ($P_a$) back to the ECC controller module. This result is stored in a 192-bit temporary register inside the controller module.

3) A 192-bit random number $n_a$ is generated by the on-chip random-number generator and sent to the ECC controller module.

4) The ECC controller module then sends this $n_a$ and the base point coordinates and other curve parameters from the non-volatile memory to the ECC point multiplier
module. It then instructs the point multiplier module to start an ECC point multiplication operation.

5) The ECC point multiplier then performs a point multiplication of the scalar \( n_b \) with the base point \( P \) and returns the result (\( 'T_n' \)) back to the ECC controller module. This result is stored in another temporary register inside the controller module.

6) The test server then generates a 192-bit random number \( n_b \) and sends this to the JTAG module bit-by-bit through the TDI input. This is then stored in the 192-bit shift (data) register of the JTAG.

7) \( n_b \) and the private key of the JTAG (\( k_c \)) are transferred to the ECC.

8) For the integer multiplication of \( k_c \) with \( n_b \), the ECC controller instructs the arithmetic module inside the point multiplier module to perform a modular multiplication of \( k_c \) with \( n_b \) using the ‘order of the prime’ (fetched from the non-volatile memory storage of curve parameters) as the modulus (this is equivalent to integer multiplication of \( k_c \) with \( n_b \)). The result is stored back in a 192-bit register inside the ECC controller module.

9) A modular addition of \( n_b \) with \( k_c \cdot n_b \) is then performed in the arithmetic block inside the point multiplier module. For this, the appropriate control is provided from the ECC controller which also stores the result of the computation (‘\( s \)’) in the same 192-bit register.

10) The ECC controller module then sends ‘\( s \)’ and the base point coordinates and other curve parameters from the non-volatile memory to the ECC point multiplier module. It then instructs the point multiplier module to start an ECC point multiplication operation.

11) The ECC point multiplier then performs a point multiplication of the scalar ‘\( s \)’ with the base point \( P \) and returns the result back to the ECC controller module. This result is stored in the same register inside the controller module.

12) Next, the ECC controller module then sends \( n_b \) and the public key of the JTAG (\( P_a \)) and other curve parameters from the non-volatile memory to the ECC point multiplier module. It then instructs the point multiplier module to start an ECC point multiplication operation.

13) The ECC point multiplier then performs a point multiplication of the scalar \( n_b \) with \( P_a \) and returns the result back to the ECC controller module. This result is stored in another temporary register inside the controller module.

14) A modular addition of the stored ‘\( T_n \)’ with \( n_b \cdot P_a \) is then performed in the arithmetic block inside the point multiplier module. For this, the appropriate control is provided from the ECC controller which also stores the result of the computation in the same 192-bit register.

15) The result of the above computation (\( T_n + n_b \cdot P_a \)) is then compared with \( s \cdot P \) computed and stored earlier inside the comparator module in the ECC controller module. If they match, then only the JTAG is allowed to enter the test and debug modes, otherwise it remains in the bypass mode.

### Appendix C

Point Addition and Point Doubling in Affine Coordinates:

When \( P = (x_P,y_P) \) and \( Q = (x_Q,y_Q) \) are not negative of each other, then \( P + Q = R \) where

\[
\begin{align*}
x_R &= s^2 x_P x_Q + y_P y_Q \\
y_R &= s^2 y_P + x_P x_Q
\end{align*}
\]

Note that \( s \) is the slope of the line through \( P \) and \( Q \).

Similarly, When \( y_p \) is not 0, then \( 2P = R \) where

\[
\begin{align*}
x_R &= 3x_P^2 - a = 2y_R \\
y_R &= 2x_P x_Q + y_P y_Q \\
x_R &= 3x_P^2 + a = 2y_R
\end{align*}
\]

Recall that ‘\( a \)’ is one of the parameters chosen with the elliptic curve and that \( a \) is the tangent on the point \( P \).

Formulae for ECC Point Addition and Doubling in Projective Coordinates:

<table>
<thead>
<tr>
<th>Table 4</th>
<th>Explicit formulae</th>
</tr>
</thead>
<tbody>
<tr>
<td><strong>Point Addition</strong></td>
<td><strong>Point doubling</strong></td>
</tr>
<tr>
<td>Cost</td>
<td>12 Field Multiplications + 2 Squarings + 6 additions + 1 shift.</td>
</tr>
<tr>
<td>Efficient elliptic curve exponentiation using mixed coordinates</td>
<td></td>
</tr>
<tr>
<td><strong>Formulæ</strong></td>
<td></td>
</tr>
<tr>
<td>( Y_1Z_2 = Y_1*Z_2 )</td>
<td>( w = 3*(X_1 - Z_1)*(X_1 + Z_1) )</td>
</tr>
<tr>
<td>( X_1Z_2 = X_1*Z_2 )</td>
<td>( s = 2<em>Y_1</em>Z_1 )</td>
</tr>
<tr>
<td>( Z_1Z_2 = Z_1*Z_2 )</td>
<td>( s_{ss} = s^{*ss} )</td>
</tr>
<tr>
<td>( u = Y_2*Z_1 - Y_1Z_2 )</td>
<td>( s_{ss} = s^{*ss} )</td>
</tr>
<tr>
<td>( uu = u*u )</td>
<td>( R = Y_1*s )</td>
</tr>
<tr>
<td>( v = X_2*Z_1 - X_1Z_2 )</td>
<td>( RR = R*R )</td>
</tr>
<tr>
<td>( vv = v*v )</td>
<td>( B = 2<em>X_1</em>R )</td>
</tr>
<tr>
<td>( vvv = v^*vv )</td>
<td>( h = w+w - 2*B )</td>
</tr>
<tr>
<td>( R = vvv*X_1Z_2 )</td>
<td>( X_3 = h*s )</td>
</tr>
<tr>
<td>( A = uu*Z_1Z_2 )</td>
<td>( vvv - 2*R )</td>
</tr>
<tr>
<td>( Y_3 = w*(B - h) - 2*R )</td>
<td>( Y_3 = s_{ss} )</td>
</tr>
<tr>
<td>( X_3 = v^*A )</td>
<td></td>
</tr>
<tr>
<td>( Z_3 = vvv*Z_1Z_2 )</td>
<td></td>
</tr>
</tbody>
</table>

Here ‘\( * \)' indicates modular multiplication which in our case has been implemented using the Montgomery Multiplier. The addition and subtraction operations denoted here are all modular in nature. Using these set of formulæ have the additional advantage that the computations are not dependent on the value of parameters ‘\( a \)’ and ‘\( b \)’.
Appendix D

Algorithm 1:
Modified Montgomery modular multiplication

Input: A, B, M
Output: R = X Y 2^(r-2) mod M
a_i : i^th bit of A, s_0 : LSB of S
1. S = 0, C = 0;
2. for i=0 to n+1
   S, C = S + C + a_i x B;
   S, C = S + C + s_0 x M;
   S = S div 2;
   C = C div 2;
3. R = S + C
4. if R > M then R = R - M
5. return R

Algorithm 2:
Montgomery modular multiplication

Input: A, B, M
Output: R = X Y 2^(r-2) mod M
a_i : i^th bit of A
r_0 : LSB of R
1. R = 0;
2. for i=0 to n-1
   R = R + a_i x B;
   R = R + r_0 x M;
   R = R div 2;
3. if R > M then R = R - M
4. return R

Appendix E

Modular adder/subtractor A "naive" implementation of a modular addition A+B mod P is presented in Fig. 5.a; it consists in computing A+B, and then subtracting P to this result. A comparison between these two intermediate results allows choosing which one to use for the final result. However, this comparator could be avoided by observing the carry (borrow) out signal of addition (subtraction) which could be realized by a single OR gate (instead of a 192-bit comparator) such as presented in Fig. 5.b. Concerning the subtraction, the principle is the same: computing A-B and then A-B+P, and comparing these intermediate results to choose which one to use for the final result. A naive and an optimized version of the subtraction are presented in Fig. 5.e and d.

The two optimized versions (Fig. 5.b and d) have been combined to produce an optimized modular adder/subtractor block such as depicted in Fig. 5.e. In this architecture an input (op type) is used to generate whether an addition or a subtraction (put to 1 for an addition and 0 for a subtraction). This architecture uses two adder/subtractor blocks (i.e., an addition combined with the inversion (or not) of the second operand using XOR gates and the input carry to '1' (or '0')) and the optimized comparison implementation depicted earlier. Concerning the architecture used for the additions/subtractions, we have used the library provided by the synthesizer which includes highly optimized RTL for arithmetic building blocks.

In the end, an efficient adder architecture combined with an optimized comparison implantation have led us optimize
the area of more than 90 %, by comparison with the area obtained from a VHDL file directly generated by our Gezel implementation.

Appendix F

16-cycle JTAG TAP Controller State Diagram

Fig. 6 TAP controller state diagram

References

5. Biskey RF and Fosik BB. Protected JTAG. Proceedings of the 2006 International Conference on Parallel Processing Workshops (ICPPW’06), 0-7695-2637-3/06
8. Explicit Formula Database http://www.hyperelliptic.org/EFD/g1p/auto-shortw.html
10. Greene meier L (August 30, 2007) iPhone Hacks Annoy AT&T but are Unlikely to Bruise Apple. Scientific American

Springer
Amritabh Das received the B.Tech degree in Electronics and Communication Engineering from Kalyani Government Engineering College, West Bengal, India, in 2003, and the M.Tech degree in Instrumentation and Electronics Engineering from Jadavpur University, Kolkata, India, in 2009. From 2003 to 2007, he worked as an Assistant Systems Engineer at Tata Consultancy Services (TCS), Kolkata, India, and as an Electronics and Instrumentation Engineer at Bharat Heavy Electricals Limited (BHEL), India. He is currently pursuing the Ph.D. degree at the Department of Electrical Engineering, Katholieke Universiteit Leuven (KU Leuven), Belgium. His research interests include secure design-for-testability, electronic design automation, hardware cryptography, embedded systems, and hardware/software co-design and co-simulation.

Jean Da Rolz received the B.S degree in Computer Engineering from the Institute of Informatics, Universidade Federal do Rio Grande do Sul (UFRGS), Porto Alegre, Brazil, in 2008. He also received the B.S degree in Telecommunications Engineering from the Ecole Nationale Superieure d'Informatique et Mathematiques Appliquees, Grenoble, France, in 2009. He recently completed his Ph.D. degree at LIRMM (Laboratoire d'Informatique, de Robotique et de Microelectronique de Montpellier), Montpellier, France. His interests include cryptography, electronic design automation and telecommunications.

Santosh Ghosh obtained his B.Tech degree in Computer Science and Engineering from Haldia Institute of Technology, Haldia, WB, India in 2002. He obtained his M.S. and Ph.D. in 2008 and 2011 from the Department of Computer Sc and Engg, Indian Institute of Technology Kharagpur, India. He was a Post-doctoral Researcher at ESAT/COSIC, KU Leuven, Belgium. His research interests include cryptography and network security, VLSI design of crypsy systems, side-channel analysis, and secure testing. He has authored about 6 Journal and 25 Conference papers and has served as Reviewers of several International Conferences and Journals. He is presently working at the Security Center of Excellence (SeCoE), Intel Corporation, 2111 NE 25th Avenue, Hillsboro, Oregon, 97124.

Stefaan Seys received the degree of Master in Electrical Engineering (Burgerlijk Ingenieur Elektrotechniek) from the KU Leuven (Belgium) in 2000. In August 2000, Stefaan started working in the research group COSIC (Computer Security and Industrial Cryptography) at the Department of Electrical Engineering (ESAT) of the Katholieke Universiteit Leuven. He completed his Doctorate in Applied Sciences in May 2006. His research interests include cryptography, wireless security and computer security.

Sophie Dupuis received the M.Sc degree in Integrated Systems Architectures and Micro-Electronics and the Ph.D degree in Computer Science from the Pierre & Marie Curie University (Paris VI), Paris, France, in 2004 and 2009, respectively. Her thesis concerns the optimization of arithmetic datapaths through the use of redundant arithmetic. She is currently an Assistant Professor at the Montpellier University in the Montpellier Laboratory of Informatics, Robotics, and Microelectronics (LIRMM). Her research interests include VLSI design (hardware description languages, design reuse, circuit design, structured ASICs), CAD tools (design methodologies, design automation, datapath synthesis, combinatorial optimization) and computer arithmetic (elementary functions, number representations).

Giorgio Di Natale received the PhD in Computer Engineering from the Politecnico di Torino (Italy) in 2003. Currently, he is a researcher for the French National Research Center (CNRS) at LIRMM (Montpellier). He has published articles in publications spanning a broad range of diverse disciplines, including memory testing, fault tolerance, software implemented hardware fault tolerance and secure circuits. He serves the Test Technology Technical Council (TTTC) of IEEE Computer Society as Web Master.

Marie-Lise Flottes received her PhD degree in Electrical Engineering in 1990 from the University of Montpellier. She is currently researcher for the French National Research Center (CNRS). Since 1990, she has been conducting research in the domain of digital system testing at LIRMM (Laboratory of Informatics, Robotics, and Microelectronics, Montpellier, France). Her research interests include design for testability, testability and dependability of secure circuits, data compression and test management for SoC and SIP.

Bruno Roureyre received the master degree in Mathematics in 1978, Docteurat degree (PhD) degree in CAD in 1984, all degrees from the University of Montpellier. Currently, he is Professor at the University of Montpellier and conducts his research at LIRMM. His research interests include several aspects of CAD for digital circuits and are particularly oriented toward optimization, verification, test, and test synthesis of digital and secure circuits.
Ingrid Verbauwhede received the electrical engineering degree and PhD degree from the KU Leuven, Belgium, in 1991. From 1992 to 1994, she was a postdoctoral researcher and visiting lecturer at the University of California, Berkeley. From 1994 to 1998, she worked for TCSI and ATMEL in Berkeley, California. In 1998, she joined the faculty of University of California, Los Angeles (UCLA). She is currently a professor at the KU Leuven and an adjunct professor at UCLA. At KU Leuven, she is a co-director of the Computer Security and Industrial Cryptography (COSIC) Laboratory. Her research interests include circuits, processor architectures and design methodologies for real-time embedded systems for security, cryptography, digital signal processing, and wireless communications. This includes the influence of new technologies and new circuit solutions on the design of next-generation systems on chip. She was the program chair of CHES’07, ASAP’08, and ISLPED’02. She was also the general chair of ISLPED’03 and CHES’12. She was a member of the executive committee of DAC’05 and ’06. She is a Fellow of the IEEE.