# A multi-chip pulse-based neuromorphic infrastructure and its application to a cortical model of orientation selectivity

Elisabetta Chicca *Student Member, IEEE,* Adrian M. Whatley, Patrick Lichtsteiner, Vittorio Dante, Tobias Delbruck, Paolo Del Giudice, Rodney J. Douglas, and Giacomo Indiveri *Member, IEEE* 

### Abstract

The growing interest in pulse-mode processing by neural networks is encouraging the development of hardware implementations of massively parallel networks of Integrate-and-Fire (I&F) neurons distributed over multiple chips. Address-event representation (AER) has long been considered a convenient transmission protocol for spike based neuromorphic devices. One missing, long-needed feature of AER-based systems is the ability to acquire data from complex neuromorphic systems and to stimulate them using suitable data. We have implemented a general-purpose solution in the form of a PCI board (the PCI-AER board) supported by software. We describe the main characteristics of the PCI-AER board, and of the related supporting software. To show all the functionality of the PCI-AER infrastructure we demonstrate a reconfigurable multi-chip neuromorphic system for feature selectivity which models orientation tuning properties of cortical neurons.

### **Index Terms**

PCI-AER, Address Event Representation, AER, WTA, orientation tuning, cooperative-competitive, asynchronous, neural chips, neural networks, VLSI

Manuscript received \_\_\_\_\_; revised \_\_\_\_\_. This work was supported in part by the EU grants ALAVLSI (IST-2001-38099), CAVIAR (IST-2001-34124) and DAISY (FP6-2005-015803) and in part by the Swiss National Science Foundation (PMPD2-110298/1). We thank our colleagues at the Institute of Neuroinformatics for designing and providing the DAC board (S. Zahnd and M. Oster), for help in implementing the stimuli (F. Roth) and analyzing the data (C. Girardin).

V. Dante and P. Del Giudice are with the Italian National Institute of Health. E. Chicca, A. M. Whatley, P. Lichtsteiner, T. Delbruck, G. Indiveri and R. J. Douglas are with the Institute of Neuroinformatics, University of Zürich and ETH Zürich

# A multi-chip pulse-based neuromorphic infrastructure and its application to a cortical model of orientation selectivity

#### I. INTRODUCTION

Networks of Integrate-and-Fire (I&F) neurons have been shown to exhibit a wide range of useful computational properties, including feature binding, segmentation, pattern recognition, onset detection, input prediction, etc. [1]. Implementing these functionality in VLSI circuits could lead to the construction of efficient hardware systems capable of solving complex sensory processing tasks in real-time. I&F neuron circuits are very well suited for VLSI implementation [2]–[7]. Large VLSI networks of I&F neurons can already be implemented on single chips, using today's technology. However implementations of pulse-based neural networks on multi-chip systems offer greater computational power and higher flexibility than single-chip systems. As inter-chip connectivity is limited by the small number of input-output connections available with standard chip packaging technologies, it is necessary to adopt time-multiplexing schemes for constructing large multi-chip networks.

In recent years we have witnessed the emergence of new asynchronous communication protocols that allow aVLSI neurons to transmit their activity across chips using pulse-frequency modulated signals (in the form of events, so-called spikes). One of the most common asynchronous communication protocols used in these types of systems is the so-called "Address-Event Representation (AER) communication protocol [8]–[11]. In this representation, input and output signals are real-time digital events that carry analog information in their temporal relationships (inter-spike intervals). Each event is represented by a binary word encoding the address of the sending node.

The activity of biological neurons is sparse in time, with typical firing rates ranging from a few per second to a few hundred per second. The speed of digital buses (tens of mega-Hertz) allows the outputs of many VLSI neurons firing at these biologically typical rates to be multiplexed over one Address-Event (AE) bus. To further reduce the bandwidth required on the AE bus, local connectivity can be hardwired on-chip [5], [6]. To handle cases in which multiple sending nodes attempt to transmit their addresses at exactly the same time (event collisions) on-chip arbitration schemes can be used [8], [12]–[14].

Chips that communicate using the AER communication protocol can be divided into *senders* with AER output only (*e.g.* silicon retinas [15], [16], or silicon cochleas [17]), *receivers* with AER input only [18], and *transceiver* chips, which are both senders and receivers [3], [5]. Systems containing more than one AER sender chips can be assembled using off-chip arbitration.

One of the earliest multi-chip systems using the AER communication protocol, a silicon model of stereoscopic vision, was implemented by Misha Mahowald [8]. The system, consisting of three silicon chips interconnected with

1

asynchronous digital buses, was able to extract, in real-time, depth information from visual stimuli detected by two silicon retinas. At that time and since, logic analyzers were and are often used to monitor AE buses. While still useful for debugging problems with AE protocol communication they suffer from several disadvantages for monitoring purposes. Good logic analyzers are typically bulky and heavy and therefore not very portable, and too expensive to equip every researchers' bench with one. They also do not usually permit on-line real-time monitoring, since data cannot be downloaded at the same time that it is being acquired into acquisition memory. This makes logic analyzers unsuitable for experiments in which it is desired to incorporate conventional software-based algorithms into the processing loop *e.g.* for certain on-line learning experiments. General purpose data acquisition (digital I/O) boards are also not ideal, since they are not built with asynchronous buses in mind and each individual event handshake must be handled by software which makes them very slow. On the AE generation side, the counterpart of the logic analyzer is the pattern generator. These are also not built with asynchronous buses in mind and cannot wait for a handshake to complete but can only emit a fixed programmed pattern. Like logic analyzers, pattern generators can also not take part in on-line software-in-the-loop experiments and are also very expensive.

Infrastructures for constructing multi-chip pulse-based neuromorphic systems based on AER have been further developed by several researchers [9], [12], [19]–[22]. A wide range of examples of multi-chip AER systems have been presented in the past. These examples have used AER infrastructures that range from very bulky and highly complex general purpose solutions [18], [19], [23], to custom solutions in the form of dedicated printed-circuit boards with microcontrollers and/or look-up tables [4], [24], [25]. In addition, a new set of general purpose AER boards with USB and USB2 interfaces have been recently proposed [22], [26], [27]. These boards represent a good compromise between general-purpose functionality, and compactness. However, as they are typically placed between AER chips in the signal processing path, they often do not have access to the address events of all chips present in the system. Furthermore each individual board has often a limited set of functionalities (e.g. to monitor address-events from a sender, to generate and send synthetic address-events to a receiver, to merge addressevents from two senders into a receiver input, to map address-events from one address space to another, etc.), or require reprogramming at the FPGA/VHDL level in order to assume one of these particular functions. In this paper we present a general-purpose solution in the form of a PCI board (the PCI-AER board) that has all of these functionalities, with the possibility of connecting up to four senders with up to four receivers, that has access to the global AER address space used by the system, and that has a well-defined software interface. The approach of using one single PCI-AER board rather than many smaller USB-AER boards has the disadvantages of requiring a PC workstation to be present, even when only mapping is required, and limiting to some extent the overall size of the AER system that can be constructed. But it has the advantage of allowing convenient and rapid prototyping (e.g. by stimulating, monitoring and/or experimenting with different address-space mappings), and of allowing seamless integration of software algorithms [28] (e.g. that implement learning or that change the network topology based on the system's activity). The PCI-AER board is therefore an ideal tool for developing AER neuromorphic models of biological sensors and neocortical processing structures. Specifically, we propose to use the board to study a computational module based on a network of spiking neurons with cooperative-competitive interactions.

We are ultimately interested in developing neuromorphic systems that reproduce some characteristics of neocortical processing modules. Despite significant differences in function across the various cortical areas, the pattern of neuronal connections within each area is remarkably similar [29]. This regular structure suggests that the cortex may use a common core processing circuit, or *canonical microcircuit*, that can be tuned to perform specific tasks and used in a modular fashion for implementing different functionalities [29], [30]. The canonical microcircuit, and its later extensions, emphasize the role of first order recurrent connections between cortical neurons. These recurrent connections between neurons support soft winner-take-all (WTA) mechanisms, in which networks of neurons participate collectively in the generation of an appropriate interpretation of their input.

The computational abilities arising from soft WTA mechanisms are especially useful for feature extraction and pattern classification problems. In the second half of this paper, we describe an application example comprising an address-event temporally differentiating vision sensor interfaced to a VLSI device with a cooperative-competitive network of spiking neurons via the PCI-AER board. We apply this AER-based vision system to the implementation and comparison of two models of orientation selectivity. The models of mechanisms responsible for orientation selectivity have been controversial since its discovery by Hubel and Wiesel [31]. Originally it was believed that the primary origin of the orientation selectivity of simple cells was due to feed-forward convergence of thalamic input (feed-forward model). Subsequent experimental studies suggested that this contribution alone is insufficient to account for all properties of orientation tuning observed in the visual cortex [32]-[34], leading to the proposal for the involvement of recurrent intracortical excitation and inhibition (feed-back model) in orientation selectivity. The origin of orientation selectivity in primary visual cortex has been extensively studied as a means to understand cortical circuitry and cortical computation [33], [35]-[38], and several hardware models of orientation selectivity have been proposed in the past and fabricated with monolithic [39]-[41] or multi-chip configurations [3], [6], [42]-[46]. An advantage of multi-chip configurations is that the computational stage is decoupled from the sensing stage. In this way the orientation selectivity computational devices can be designed to be modular and expandable. Within the multi-chip configuration approaches there are two main streams: [6], [42], [44] and [45] implement specific architectures with local or hardwired connectivity for processing signals obtained from vision sensors, while [3] and [46] propose using general purpose transceivers that rely on the AER communication infrastructure to construct receptive fields tuned to different orientations. The multi-chip system we designed is a hybrid of the approaches proposed in these two main streams, as it has local hardwired connections and supports arbitrary connectivity patterns via additional AER synapses. Specifically, in our system we can map different types of sensory inputs (e.g. obtained from a silicon retina, a silicon cochlea, or other AER sensory systems) onto the network's AER synapses in a way to implement cooperation and competition across different types of feature maps. The computational part of the system is not explicitly designed for orientation selectivity. Instead, it models a more generic computational module (which represents a portion of a cortical module [29], [30]) that can be applied to the detection of other features, and to other sensory modalities. In our specific application example the receptive fields emerge both from the inter-chip feed-forward connectivity [31] and the intra-chip recurrent cooperative-competitive connectivity.

In the next Section we describe the PCI-AER board and its supporting software. In Section III we present the

application example on orientation tuning, and in Section IV we present a discussion and concluding remarks.

### II. THE COMMUNICATION INFRASTRUCTURE

### A. The hardware: PCI-AER board

The PCI-AER board takes the form of a 33MHz, 32-bit, 5V PCI bus add-in card (Fig. 1(a)). It was designed by V. Dante and the PCBs were manufactured by Ermes Technology S.R.L. (Via Ivrea 18, 10080 San Benigno Canavese, Italy). Most of the two dozen or so boards in existence were assembled by SMTEC AG (Gewerbestrasse 5, 8451 Kleinandelfingen, Switzerland). When the board is installed in a host PC, a 68-way cable is used to connect it to a small header board (Fig. 1(b)) which can be conveniently located on the bench-top and provides connectors for up to four AER receivers and four AER senders. The header board also electrically buffers the signals to and from the receivers and senders. If only one receiver is used, all 16 bits of the AER bus can be used by that receiver. If two receivers are used, the topmost bit is used to distinguish between them and only 15 bits of address are available to each receiver. If all four receiver channels are used then the two topmost bits are used to distinguish between them and only 14 bits of address are available to each receiver. The number of channels to be used (1, 2 or 4) can be configured by software. Similarly the senders may use 16, only 15 or only 14 bits of address according to whether the board is configured for 1, 2 or 4 sender chips. Senders must use the so-called 'SCX' multi-sender AER protocol [47] in which request and acknowledge signals are active low and the bus may only be driven while the acknowledge signal is active. Receivers may use either this 'SCX' protocol, or they may choose to use a point-to-point protocol [10] in which request and acknowledge are active high and the bus is driven while request is active. Which protocol is generated by the board may be selected under software control.

As illustrated in Fig. 2, the PCI-AER board can perform three functions which are executed by blocks we refer to as the monitor, sequencer, and mapper. These blocks are implemented in two FPGAs on the board. The division of the functionality between the two FPGAs is a consequence of the data flow. One FPGA deals with all incoming events, whether from external sources or from the PCI bus and optionally passes these events on to the mapper. Hence this FPGA implements the monitor and sequencer functions. The other FPGA performs the mapper function (including managing the interface to the mapper's SRAM) and manages communication with the AER receivers.

The monitor can capture and timestamp events coming from the attached AER senders via an arbiter <sup>1</sup> and makes those events available to the PC for storage or further on-line processing. A timer is implemented in one of the FPGAs, and when an incoming address-event is read, a timestamp is stored along with the address in a first-in first-out (FIFO) memory. This FIFO decouples the management of the incoming address-events from read operations on the PCI bus, the bandwidth of which must be shared with other peripherals in the PC such as the network card. The FIFOs fitted to the current boards are all 8KWords deep and since the addresses occupy one word and the timestamps two words each, this is sufficient to hold 2730 complete events<sup>2</sup>. Interrupts to the host

<sup>&</sup>lt;sup>1</sup>The arbiter is a binary-tree arbiter in which each binary cell is a priority based arbiter.

<sup>&</sup>lt;sup>2</sup>The monitor can also be run without storing timestamps, in which case the FIFO can hold up to 8K event addresses.

PC can be generated when the FIFO becomes half-full and/or full, and in the ideal case, the driver will read time stamped address-events from the monitor FIFO whenever the host CPU receives a FIFO half-full interrupt, at a rate sufficient that the FIFO never fills or overruns, given the rate of incoming address-events. If the CPU fails to empty the FIFO at a sufficient rate, the FIFO will fill up, the FIFO full LED on the header board will light, and a FIFO full interrupt will be generated. At this point, incoming events will be lost until such time as the CPU can once again read from the FIFO. In the application example we present in Section III the monitor is used to record the activity of a sender and a transceiver chip.

The sequencer allows events originated by the host PC to be sent out to the attached AER receivers. These events may for example represent a pre-computed, buffered stimulus pattern, but they might also be the result of a real-time computation. This allows for instance software simulations of VLSI devices to provide input to real VLSI hardware while the former VLSI devices are still under development. As soon as the real device is available, the software simulation can be seamlessly replaced. Like the monitor, the sequencer is decoupled from the PCI bus using an 8KWord FIFO. The host writes a sequence of words representing addresses and time delays to the sequencer FIFO. The sequencer then reads these words one at a time from the FIFO and either emits an address-event or waits the indicated number of microseconds. Since addresses and time delays are represented in the sequencer by one word each, and a stream of address-events usually consists of alternating addresses and time delays representing inter-spike intervals, the 8KWord FIFO can typically hold up to 4096 events. FIFO half empty interrupts can be generated to signal the CPU to supply further data to the sequencer. If the CPU fails to supply data to the sequencer at a rate sufficient to prevent the sequencer FIFO becoming empty, this may indicate a failure of the system to generate the desired sequence of events with the desired timing. In this case a sequencer FIFO empty interrupt is raised to signal the underrun. The address events generated by the sequencer pass through the mapper and can therefore be transmitted on any of the four output channels.

The mapper implements programmable inter-chip synaptic connectivity. It maps incoming address-events from attached AER senders and/or the sequencer to one or more outgoing addresses for transmission to the attached AER receivers. It can operate in pass-through, one-to-one, or one-to-many modes. In pass-through mode, the outgoing addresses are the same as the incoming addresses. In one-to-one mode, each incoming address is mapped to one outgoing address by using the incoming addresses as the index into a look-up table stored in on-board SRAM. This look-up table then holds the corresponding target output addresses. In one-to-many mode, each incoming address may be mapped to one or more outgoing addresses. This is achieved by using the contents of the look-up table as pointers to lists of output addresses, also stored in the 2MWords of on board SRAM.<sup>3</sup> The mapper too has a FIFO which decouples the asynchronous reception of the incoming address-events from the generation of outgoing address-events. Should this FIFO become full, it is possible that events will be lost, and this eventuality can be signaled to the CPU by means of an interrupt. The mapper, once it has been configured and the look-up table filled

<sup>&</sup>lt;sup>3</sup>In order to send the same address to more than one output channel, the appropriate target addresses must be listed serially in the output address lists.

with the required mappings, operates entirely independently of the host CPU, since all of the necessary operations, including table look-up are performed by one of the FPGAs. To implement the multi-chip system described in Section III we used the mapper in one-to-many mode.

A detailed description of the hardware (the Hardware User Manual) is available at [48].

### B. The supporting software

To enable the functionality of the board to be accessed robustly and conveniently, we have provided a Linux device driver and, on top of this, a C library. Both are fully documented and this documentation is available together with the source code at [48]. The open-source driver provides full integration of the PCI-AER board under Linux following the Unix 'everything-is-a-file' model. This allows AE data streams to be accessed using standard Unix read and write calls and supports the use of standard shell redirection and command-line tools. The driver provides separate logical devices for each of the major functional blocks of the board: mapper; monitor; and sequencer, and supports multiple boards. It ensures that the AE data streams remain coherent by serializing accesses from multiple programs running simultaneously. It also forces word-multiple sized access to prevent corruption of the data streams due to misalignment. While read and write calls are used to read and write the AE data streams from and to the board's FIFOs, IOCTL (input/output control) calls are provided to set and get configuration states, and user programs are prevented from putting the board into an inconsistent state. The driver also manages the mapper look-up table memory to relieve users of the task of performing the necessary but onerous and error-prone table indexing and pointer arithmetic and so prevents the mapping table from becoming corrupted. Statistics (number of words read or written, number of interrupts, number of FIFO overruns or underruns etc.) are also maintained by the driver for each logical device and made available to user programs.

The library consists mainly of thin, fast wrapper functions around the driver open, close, read, write, flush and ioctl calls. Functions are also provided to convert from the PCI-AER hardware-specific format to a generic interspike interval/address-event format for reading, and vice versa for writing. The conversion function for reading also attempts recovery when data are received out of order because of monitor FIFO overruns or other (hardware) errors.

### C. Performance

The performance of the PCI-AER hardware and driver must be considered together since the board is effectively unusable without the driver. Some of the various aspects of the performance are highly dependant on the type of PC in which the board is installed and on which the driver is running, and on the version and configuration of the Linux kernel in use. Nonetheless the driver is instrumented for measuring the throughput of the monitor and sequencer and measurements have been performed using version 2.30 of the driver on a 1GHz, 512MB, AMD Athlon based machine running a SuSE 9.1 Linux distribution with kernel version 2.6.5.The PCI-AER board FPGA revisions used were 4202 (FPGA1) and 4203 (FPGA2). In order to eliminate as many outside influences as possible and obtain reproduceable results, measurements were made with no graphical display system and no network stack running. The results of these measurements are shown in Table I. Performance data obtained when using a more

sophisticated software framework on top of a slightly earlier version of the driver on a different machine were presented in [28].

The limiting factors are various. The monitor maximum sustainable rate is determined principally by the speed at which the AE stream can be read from the board over the PCI bus and buffered in the PC's memory. This in turn depends on many things - CPU speed, cache size and organization, PCI bus chipset etc. The sequencer maximum sustainable rate lies in the region between using inter-spike intervals of  $1\mu s$  and inter-spike intervals of  $0\mu s$ , but since the best available resolution of the timer controlling the emission of events from the sequencer is  $1\mu s$ , the timer resolution represents the limiting factor here. Were the resolution of the timer better, then the data transfer rate would become the limiting factor.

The present board is unable to perform DMA to shift AE data across the PCI bus because the AMCC S5920 PCI interface chip [49] used does not support it, but it would certainly be interesting for future generations of hardware to use DMA, not to achieve greater throughput through the board (the same amount of data would still have to be shifted across the PCI bus), but to offload the CPU. Alternatively, with the present board, use might in future be made of the forthcoming Intel I/O Acceleration Technology (I/OAT)<sup>4</sup> in which the CPU includes a DMA subsystem.

Minimum AE cycle times on both the inputs from senders and on the outputs to receivers depend only on the frequency of the clock on the board (20MHz on the present board) and the requirement to remain within the AER protocol specifications. They do not depend on the number of channels in use, although the latency before acknowledging a request on one arbiter input channel may of course be influenced by the presence of events on other channels if other input channels are in use.

Since the limiting factors are principally on the PC side and not on the AER side, it is clear that the overall bandwidth available for say monitoring remains constant irrespective of the number of channels in use, and that therefore if multiple channels are being monitored, they must share the available bandwidth. Note however that when not monitoring or sequencing, the bandwidth available on the PCI bus plays no role in the mapping performance, since the data-path is then from FPGA1 to Mapper FIFO to FPGA2 and does not even involve the Local Bus on the board (refer again to Fig. 2).

The throughput of the mapper depends on whether it is being operated in pass-through, one-to-one or one-tomany modes, and in the latter case, on the length of the target address lists being used. In measuring the sequencer rates given in Table I, the mapper is being used in pass-through mode, thus it is clear that at least in this mode it can sustain rates of  $\approx 1 \ M \ events \ s^{-1}$ . This would allow  $10^4$  neurons to be actively firing at a rate of 100Hz. Assuming no more than 10% of neurons are active at one time, a network of the order of  $10^5$  neurons could be supported by one PCI-AER card, but address space considerations restrict us to supporting a maximum of 65536 neurons on the sender side and a maximum of 65536 synapses on the receiver side. If the network produces more spikes than can be processed by the mapper in real-time, then when the mapper FIFO fills, spikes will eventually

<sup>4</sup>http://www.intel.com/technology/ioacceleration/

DRAFT

be lost, but this will have no influence on the AE protocol cycle times observed on the sender side.

Although the driver supports the use of more than one PCI-AER board in one PC, the user inevitably remains limited by the characteristics of that host PC and in particular the bandwidth available for monitoring and/or sequencing within the host system must then be shared between all of the boards which are fitted to that system. However as noted above, when boards are only being used for mapping, there is no impact on the host, so several boards fitted to the same PC might easily be used for mapping in larger AE systems while for instance only one board at a time performs a monitoring function. AEs do not need to be routed from one PCI-AER board to another via the PC but rather one of the output channels of one board could be connected to one of the input channels of another board, or perhaps to the input of a transceiver chip the output of which goes to the input of another board.

## III. APPLICATION EXAMPLE: ORIENTATION SELECTIVITY USING A SILICON RETINA AND A WINNER-TAKE-ALL NETWORK

### A. The orientation selectivity system components

The orientation selectivity system consists of two neuromorphic aVLSI AER chips, a PCI-AER board and supporting hardware (see Fig. 3). The neuromorphic chips are an address-event temporally differentiating vision sensor (TMPDIFF chip) [50] and a recurrent competitive network of I&F neurons and short-term dynamic synapses (IFWTA chip) [51].

The AEs generated by the TMPDIFF chip and sent to the WTA chip are routed by the PCI-AER board mapper. The PCI-AER board monitor is used to read all AEs (generated by the two chips), timestamp them and log them on the host PC (see Fig. 3).

The supporting hardware comprises a custom Digital to Analog Converter (DAC) board [52] for setting the analog biases of the neuromorphic chips, an LCD screen for presenting visual stimuli, and a workstation for hosting and controlling the PCI-AER board, programming the DAC board and controlling the LCD screen.

The PCI-AER board mapper functionality was critical in this application example, as it allowed us to (re-)configure the mapping between the TMPDIFF pixels and the IFWTA synapses. Similarly, the board's monitoring function allowed us to store arbitrary large amounts of address-events generated by the system for off-line analysis. These multi-chip system was not developed to process real images, rather it was designed to validate models of orientation selectivity and illustrate the functionalities of the PCI-AER board.

1) The TMPDIFF chip: The TMPDIFF chip implements the sensing stage of our system. The chip produces asynchronous AEs in response to temporal changes in logarithmic intensity. The stream of events encodes contrast change rather than absolute illumination change. The retinal computation is optimized to deliver relevant information and to discard redundancy using high temporal and low spatial resolution, similar to the biological magnocellular pathway. Because the TMPDIFF chip responds only to temporal changes in logarithmic intensity, static scenes produce no output. AEs represent relative changes in image intensity that usually are generated by viewpoint or object movement. The TMPDIFF pixel front end photoreceptor circuits independently compute the temporal derivative of the logarithm of the pixel illumination I in continuous time. The output of the photoreceptor circuit

consists of an ON current for increasing intensities and an OFF currents for decreasing intensities. The ON and the OFF currents are proportional to the temporal derivative of  $\ln I$ . The ON current is fed as an input current into an I&F neuron circuit that communicates quantized logarithmic changes as ON-events. The OFF current is fed into another I&F neuron that produces OFF-events. As long as the temporal frequency of the visual stimulus is higher than the corner frequency of the input to the neuron circuits (2 Hz), each event means that the logarithmic intensity changed by a certain fixed amount since the last event. If the absolute pixel illumination is *I*, then each event represents a quantized change *R* in  $\ln I$ :

$$R = d\ln I = dI/I \tag{1}$$

Thus the temporal derivative is self-normalized. Pixel output consists of the stream of ON and OFF events. This vision sensor, more thoroughly described in [53] and [50], consists of an array of  $32 \times 32$  pixels, a y-arbiter, an x-arbiter and a common address bus with two encoders [11]. An event occurring in a pixel is communicated to the outside of the chip as an 11-bit address that encodes the pixel X-Y location and the polarity (ON or OFF) of the event. Events are processed asynchronously in order of their arrival time. In case of colliding events the later events are queued. The vision sensor is a real-time device, as events are typically communicated within 100ns of their occurrence. The AER communication system is particularly well suited for this application because it dedicates the full communication bandwidth to the active pixels of the vision sensor and preserves timing information. In response to a flashed bar, for example, within the first few milliseconds after presentation or removal of the bar a burst of a few hundred events is typically emitted; these bursts are preceeded and followed by zero activity. The maximum event rate is about 2 *M event*  $s^{-1}$ . With sparse activation a very high temporal resolution is achievable, comparable with frame rates of several kHz.

2) The IFWTA chip: The architecture of the IFWTA chip is shown in Fig. 4(a). It is a two-dimensional array containing a row of 32 I&F neurons, each connected to a column of afferent synaptic circuits. Each column contains 14 AE excitatory synapses, 2 AE inhibitory synapses and 6 locally connected (hardwired) synapses. When an address-event is received, the synapse with the corresponding row and column address is stimulated. If the synaptic current resulting from the AEs routed to the neuron integrates to the neuron's voltage threshold for spiking, then that neuron generates an address-event which is transmitted off-chip. The AE input synapses can be used to implement arbitrary network architectures, by (re)mapping address-events via the PCI-AER board.

Synapses with local hardwired connectivity are used to realize a cooperative-competitive network with recurrent interactions (see Fig. 4(b)): 31 neurons of the array send their spikes to 31 local excitatory synapses on the global inhibitory neuron; the inhibitory neuron, in turn, stimulates the local inhibitory synapses of the 31 excitatory neurons; each excitatory neuron stimulates its first and second neighbors on both sides using two sets of locally connected excitatory synapses. The first and second neighbor connections of the neurons at the edges of the array are connected to pads. This allows us to leave the network open, or implement closed boundary conditions (to form a ring of neurons [54]), using off-chip jumpers. The local synapses are non-linear integrators which produce analog

currents in response to digital input spikes. The local hardwired connectivity was implemented as described above in order to reduce AER bandwidth usage, while trying to keep the additional area occupied small with respect to the overall network size. Furthermore, it provides the flexibility to use the chip as a standalone module for single-chip experiments in which there is no need for mapping.

All of the synapses on the chip can be switched off by appropriately setting the external bias voltages that control their synaptic weights; the local and AER synapses are controlled by independent bias voltages. This allows us to inactivate either the local or the AE synaptic connections, or to use them in some arbitrary combination. A detailed description of the IFWTA chip was presented in [51], [55].

### B. Orientation selectivity experiments

In our application example, broad orientation selectivity is achieved by appropriately mapping feed-forward connections from the TMPDIFF pixels to the IFWTA chip neurons (via the PCI-AER board), and it is sharpened by activating the local recurrent connections on the IFWTA chip. The feed-forward mapping is set so that each IFWTA neuron collects all the TMPDIFF ON and OFF events that belong to a bar with a specific orientation and position (discarding polarity), as shown in Fig. 5. We mapped 31 different groups of TMPDIFF pixels onto 31 neurons of the IFWTA chip so as form 31 differently oriented receptive fields. (The orientations of these receptive fields are indicated by the bars shown as insets in figure 6(a).)

In our experiments we displayed flashing oriented white bars on a dark background to the TMPDIFF chip. The activity of the TMPDIFF chip was monitored by the PCI-AER board and transmitted (via the PCI-AER board mapping tables) to the IFWTA chip. Using the PCI-AER board, we time-stamped and logged both the TMPDIFF and IFWTA address-events for data analysis. To characterize the system we collected the system's activity in response to bars of 30 different orientations (6 degrees apart from each other) chosen independently of the set of pre-wired preferred orientations. Each oriented bar was flashed at a rate of about 2.5 Hz, producing one ON and one OFF transition per cycle and the address-event data was monitored for 25 seconds. Because it was not easy to synchronize the stimulus onset with the start of the monitoring, we decided to start the monitoring 5 seconds after executing the command to start the stimulus. In this way we were sure that the stimulus was already present when we started monitoring.

We repeated the same experiment for two different conditions in terms of the local connectivity of the WTA chip. In the first condition the biases of the WTA chip were set to implement a purely feed-forward model: local recurrent synapses were inactive and the neurons' inputs were completely determined by the activity of the retinal pixels. Subsequently, we activated the recurrent connectivity to implement the feed-back model maintaining all other parameters unchanged. Three sets of local synapses were used to implement the feed-back model: (1) first neighbor excitatory to excitatory synapses to simulate the mutually excitatory connections among cells with similar preferred orientation, (2) inhibitory and (3) excitatory synapses connecting the global inhibitory neuron to the excitatory neurons and vice versa to simulate the mutual inhibition among cells with different preferred orientation (see Fig. 4(b)). The effect of competition alone is described in [55].

DRAFT

Orientation tuning curves (*i.e.* graphs of neural response vs. stimulus orientation) are typically measured in experiments related to the characterization of orientation selectivity in visual cortical neurons. We applied the same analysis to our data: the recorded activity of the WTA neurons was used to compute the mean firing rate of each neuron in response to the stimuli and tuning curves were obtained by plotting these data for each neuron as a function of stimulus orientation. Fig. 6(a) shows the computed tuning curves for each neuron of the IFWTA. Each sub-figure represents the mean response of the neuron to different orientations. The inset in each sub-figure represents the retinal pixels mapped to that particular neuron.

The TMPDIFF central pixels are mapped to all neurons, therefore each WTA neuron is also receiving input events when its non-preferred orientation is presented to the retina. The effect of this 'base line' input is clearly visible in the feed-forward model, where the activity of the WTA neurons simply reflects the input from the retina. In this case, the frequencies in the tuning curves are greater than zero for all orientations and a maximum is observed at the preferred orientation. In the feed-back model the 'base line' activity is suppressed and the activity in response to the preferred orientation is amplified.

We fitted the tuning curves to quantitatively estimate the effect of recurrent connectivity on the response of the orientation selective neurons. We used a von Mises function as the fitting function [56], defined as

$$M(\theta) = Ae^{k[\cos 2(\theta - \phi) - 1]}$$
<sup>(2)</sup>

where A is the value of the function at the preferred orientation  $\phi$ , and k is a width parameter, from which the half-width at half-height  $\theta_{0.5}$  may be calculated (in radians) as:

$$\theta_{0.5} = 0.5 \arccos[(\ln 0.5 + k)/k]; \ k > -0.5 \ln 0.5 \tag{3}$$

The von Mises function approximates a Gaussian in shape over a biologically likely range of values of k. A least-squares fit of the data to the von Mises function was used to estimate the parameters of the tuning curve of each selective oriented neuron.

Figure 6(b) shows the tuning curve of the neuron tuned to vertical orientation: the data and the von Mises fitted function are plotted for both the feed-forward and feed-back model. The data points used to perform the fits are the mean frequencies of the neurons computed over the 25*s* of data acquisition. The IFWTA chip is stimulated only during and shortly after the appearance and disappearance of the bar, when the ON and OFF pixels of the TMPDIFF chip are activated by the visual stimulation. High variability is then induced in the pattern of activity of the TMPDIFF and IFWTA chips (see Fig. 8), with bursts of events during the appearance and disappearance of the flashing bar and gaps of no activity in between. Ideally, the spike rate during each single burst should be measured and considered as a single measurement. The mean and standard deviation over many repetitions of this measurement would provide a good estimation of the mean frequency and its variation. To allow a simpler manipulation of the data and start from a more reliable 'single' measurement we decided to divide our 25*s* acquisition time into five 5*s* intervals, and consider the means over these intervals as single measurements of the neurons' mean frequencies in response to the stimulus. The variability of our data (shown as error bars in Fig. 6(b)) is then computed as the standard deviation over the five measured frequencies.

To evaluate the goodness of the fits we used the R-square value (the square of the correlation between the measured values and the values predicted by the fit). It can take on any value between 0 and 1, with a value closer

to 1 indicating a better fit. We calculated R-square for all the fits: the mean of all the computed values is 0.982 with a standard deviation of  $9 \times 10^{-3}$ , which indicates that on average the fits can explain 98% of the total variation in the data.

Figure 7(a) and 7(b) show the estimated amplitude and half-width at half-height respectively, for all the neurons in the network in the feed-back versus the feed-forward configuration. All neurons lie above the diagonal in Fig. 7(a), showing that the response to the preferred orientation is amplified in the feed-back network with respect to the response in the purely feed-forward network. Sharpening of the tuning is shown in Fig. 7(b), where neurons tend to lie below the diagonal. The population mean values of these parameters plus the baseline activity and the preferred orientation error are listed in Table II. On average the peak activity in the feed-back network is twice the peak activity in the feed-forward network and the ratio between the half-width at half-height for the two configurations is 0.9 (feed-back over feed-forward).

### IV. CONCLUSIONS AND OUTLOOK

We have presented a flexible hardware/software infrastructure for building complex neuromorphic systems using the AER. It provides monitoring, sequencing and mapping functions easily accessible through the software interface and it allows convenient and rapid prototyping (*e.g.* by stimulating, monitoring and/or experimenting with different address-space mappings). The PCI-AER board is therefore an ideal tool for developing single and multi-chip AER systems. Additional application examples that rely on this PCI-AER board have been recently presented in [28], [57]–[60]. This infrastructure and its documentation has reached the point at which it can be easily used by researchers and labs which were not involved in its development. At the time of writing, five such labs have acquired one or more boards each. Some users have written small C or C++ applications for spike-train generation and data logging directly using the library API (Application Programming Interface). A Matlab toolbox [61] has been developed for the off-line generation of spike trains to be sent to the PCI-AER board via the library and driver. A client-server architecture [28] has also been developed on top of the library to enable the use of the board on-line from within Matlab, including real-time data display.

Future developments should include a refinement of this client-server architecture to enable multiple data-sinks to read the monitored AE stream concurrently in a coordinated way. Other possible future developments include Java support, and a stimulation tool for the on-line generation of AE patterns to drive the sequencer. The instrumentation of the driver and consequent availability of performance data will aid the assessment of the present communication infrastructure, and can be used to guide future driver optimization work. The library could also be ported to other AER monitoring, mapping and sequencing hardware providing cross-platform compatibility for higher level software.

We presented the implemented an orientation selectivity system composed of a sensing stage (the TMPDIFF chip) and a computational module (the IFWTA) not explicitly designed for this purpose by using a specific mapping

between the two chips. We showed how the recurrent connectivity in the computational module has an effect on the response to oriented stimuli similar to those described in theoretical models of orientation selectivity.

This experiment demonstrates the feasibility of real-time AER-based inter-chip communication through the PCI-AER interface. The orientation selectivity system was assembled exploiting the monitoring and mapping functionality of the PCI-AER board, thereby demonstrating the capabilities of the board. Other experiments exploiting the monitor and sequencing functionality have been described elsewhere (*e.g.* [57], [62]–[64]).

### REFERENCES

- [1] W. Maass and C. M. Bishop, Pulsed Neural Networks. MIT Press, 1998.
- [2] E. Chicca, D. Badoni, V. Dante, M. D'Andreagiovanni, G. Salina, S. Fusi, and P. Del Giudice, "A VLSI recurrent network of integrateand-fire neurons connected by plastic synapses with long term memory," *IEEE Transactions on Neural Networks*, vol. 14, no. 5, pp. 1297–1307, September 2003.
- [3] D. Goldberg, G. Cauwenberghs, and A. Andreou, "Probabilistic synaptic weighting in a reconfigurable network of VLSI integrate-and-fire neurons," *Neural Networks*, vol. 14, no. 6–7, pp. 781–793, Sep 2001.
- [4] R. J. Vogelstein, F. Tenore, R. Philipp, M. S. Adlerstein, D. H. Goldberg, and Cauwenberghs, "Spike timing-dependent plasticity in the address domain," in *Advances in Neural Information Processing Systems*. Cambridge, MA: MIT Press, 2003.
- [5] G. Indiveri, T. Horiuchi, E. Niebur, and R. Douglas, "A competitive network of spiking VLSI neurons," in World Congress on Neuroinformatics, ser. ARGESIM Report no. 20, F. Rattay, Ed. Vienna: ARGESIM / ASIM - Verlag, 2001, pp. 443–455.
- [6] P. Merolla and K. Boahen, "A recurrent model of orientation maps with simple and complex cells," in Advances in Neural Information Processing Systems. MIT Press, December 2004, vol. 16, pp. 995–1002.
- [7] F. Tenore, J. Vogelstein, R. Etienne-Cummings, G. Cauwenberghs, and P. Hasler, "A floating-gate programmable array of silicon neurons for central pattern generating networks," in *Proceedings of IEEE International Symposium on Circuits and Systems*. IEEE, 2006.
- [8] M. Mahowald, "VLSI analogs of neuronal visual processing: a synthesis of form and function," Ph.D. dissertation, Department of Computation and Neural Systems, California Institute of Technology, Pasadena, CA., 1992.
- [9] J. Lazzaro, J. Wawrzynek, M. Mahowald, M. Sivilotti, and D. Gillespie, "Silicon auditory processors as computer peripherals," *IEEE Transactions on Neural Networks*, vol. 4, pp. 523–528, 1993.
- [10] "The address-event representation communication protocol AER 0.02," Caltech internal memo, February 1993, http://www.ini.unizh.ch/~amw/scx/std002.pdf.
- [11] K. Boahen, "Communicating neuronal ensembles between neuromorphic chips," in *Neuromorphic Systems Engineering*, T. S. Lande, Ed. Norwell, MA: Kluwer Academic, 1998, pp. 229–259.
- [12] K. A. Boahen, "Point-to-point connectivity between neuromorphic chips using address-events," *IEEE Transactions on Circuits and Systems II*, vol. 47, no. 5, pp. 416–34, 2000.
- [13] —, "A burst-mode word-serial address-event link I: Transmitter design," IEEE Circuits and Systems I, vol. 51, no. 7, pp. 1269–80, 2004.
- [14] Z. Kalayjian and A. Andreou, "Asynchronous communication of 2D motion infromation using winner-takes-all arbitration," in *Neuromorphic Systems Engineering*, T. S. Lande, Ed. Norwell, MA: Kluwer Academic, 1998, pp. 217–27.
- [15] E. Culurciello, R. Etienne-Cummings, and K. Boahen, "Arbitrated address-event representation digital image sensor," *Electronics Letters*, vol. 37, no. 24, pp. 1443–1445, Nov 2001.
- [16] P. Lichtsteiner, C. Posch, and T. Delbrück, "A 128×128 120dB 30mW asynchronous vision sensor that responds to relative intensity change," in 2006 IEEE ISSCC Digest of Technical Papers. IEEE, 2006, pp. 508–509.
- [17] A. van Schaik and S.-C. Liu, "AER EAR: A matched silicon cochlea pair with address event representation interface," in *IEEE International Symposium on Circuits and Systems*, vol. V, no. 4213-4216, May 2005.
- [18] G. Indiveri, A. Whatley, and J. Kramer, "A reconfigurable neuromorphic VLSI multi-chip system applied to visual motion computation," in *Proceedings of the Seventh International Conference on Microelectronics for Neural, Fuzzy and Bio-inspired Systems; Microneuro'99*. Los Alamitos, CA: IEEE Computer Society, April 1999, pp. 37–44.

- [19] S. R. Deiss, R. J. Douglas, and A. M. Whatley, "A pulse-coded communications infrastructure for neuromorphic systems," in *Pulsed Neural Networks*, W. Maass and C. M. Bishop, Eds. MIT Press, 1998, ch. 6, pp. 157–78.
- [20] A. Mortara, E. Vittoz, and P. Venier, "A communication scheme for analog VLSI perceptive systems," *IEEE Journal of Solid-State Circuits*, vol. 30, pp. 660–9, 1995.
- [21] V. Dante, P. Del Giudice, and A. M. Whatley, "PCI-AER hardware and software for interfacing to address-event based neuromorphic systems," *The Neuromorphic Engineer*, vol. 2, no. 1, pp. 5–6, 2005, http://ine-web.org/research/newsletters/index.html.
- [22] R. Serrano-Gotarredona, M. Oster, P. Lichtsteiner, A. Linares-Barranco, R. Paz-Vicente, F. Gómez-Rodríguez, H. Kolle Riis, T. Delbrück, S. C. Liu, S. Zahnd, A. M. Whatley, R. J. Douglas, P. Häfliger, G. Jimenez-Moreno, A. Civit, T. Serrano-Gotarredona, A. Acosta-Jiménez, and B. Linares-Barranco, "AER building blocks for multi-layer multi-chip neuromorphic vision systems," in *Advances in Neural Information Processing Systems*, S. Becker, S. Thrun, and K. Obermayer, Eds., vol. 15. MIT Press, Dec 2005.
- [23] D. P. M. Northmore and J. G. Elias, "Building silicon nervous systems with dendritic tree neuromorphs," in *Pulsed Neural Networks*, W. Maass and C. M. Bishop, Eds. MIT Press, 1998, ch. 5, pp. 135–156.
- [24] C. M. Higgins and C. Koch, "A modular multi-chip neuromorphic architecture for real-time visual motion processing," Analog Integrated Circuits and Signal Processing, vol. 24, pp. 195–211, 2000.
- [25] P. Merolla and K. Boahen, "Dynamic computation in a recurrent network of heterogeneous silicon neurons," in *IEEE International Symposium on Circuits and Systems, ISCAS 2006.* IEEE, May 2006.
- [26] F. Gomez-Rodriguez, R. Paz, A. Linares-Barranco, M. Rivas, L. Miro, S. Vicente, G. Jimenez, and A. Civit, "Aer tools for communications and debugging," in *Proceedings of IEEE International Symposium on Circuits and Systems*. IEEE, 2006, pp. 3253–3256.
- [27] R. Berner, "Diploma thesis: High speed USB2.0 AER interfaces," 2006, University of Zürich, ETH Zürich and Universidad de Sevilla.
- [28] M. Oster, A. M. Whatley, S.-C. Liu, and R. J. Douglas, "A hardware/software framework for real-time spiking systems," in Artificial Neural Networks: Biological Inspirations — ICANN 2005: 15th International Conference, Warsaw, Poland, September 11-15, 2005. Proceedings, Part I, ser. Lecture Notes in Computer Science, W. Duch, J. Kacprzyk, E. Oja, and et al., Eds., vol. 3696. Springer-Verlag GmbH, Sep 2005, pp. 161–166.
- [29] R. J. Douglas and K. A. C. Martin, "Neural circuits of the neocortex," Annual Review of Neuroscience, vol. 27, pp. 419-51, 2004.
- [30] R. Douglas, K. Martin, and D. Whitteridge, "A canonical microcircuit for neocortex," Neural Computation, vol. 1, pp. 480–488, 1989.
- [31] D. Hubel and T. Wiesel, "Receptive fields, binocular interaction and functional architecture in the cat's visual cortex," *Jour. Physiology*, vol. 160, pp. 106–54, 1962.
- [32] D. Ferster and K. D. Miller, "Neural mechanisms of orientation selectivity in the visual cortex," Annu. Rev. Neurosci., vol. 23, pp. 441–71, 2000.
- [33] H. Sompolinsky and R. Shapley, "New perspective on the mechanisms for orientation selectivity," *Current Opinion in Neurobiology*, vol. 7, pp. 514–22, 1997.
- [34] T. W. Troyer, A. E. Krukowski, N. J. Priebe, and K. D. Miller, "Contrast-invariant orientation tuning in cat visual cortex: Thalamocortical input tuning and correlation-based intracortical connectivity," *The Journal of Neuroscience*, vol. 18, no. 15, pp. 5908–27, 1998.
- [35] R. Shapley, M. Hawken, and D. L. Ringach, "Dynamics of orientation selectivity in the primary visual cortex and the importance of cortical inhibition," *Neuron*, vol. 38, pp. 689–99, 2003.
- [36] D. C. Somers, S. B. Nelson, and M. Sur, "An emergent model of orientation selectivity in cat visual cortical simple cells," *The Journal of Neuroscience*, vol. 15, pp. 5448–65, 1995.
- [37] R. Ben-Yishai, R. Lev Bar-Or, and H. Sompolinsky, "Theory of orientation tuning in visual cortex," *Proceedings of the National Academy of Sciences of the USA*, vol. 92, no. 9, pp. 3844–3848, April 1995.
- [38] R. J. Douglas, M. A. Mahowald, and K. A. C. Martin, "Hybrid analog-digital architectures for neuromorphic systems," in *Proc. IEEE World Congress on Computational Intelligence*, vol. 3. IEEE, 1994, pp. 1848–1853.
- [39] T. Serrano-Gotarredona, A. G. Andreou, and B. Linares-Barranco, "AER imager filtering architecture for vision-processing systems," *IEEE Transactions on Circuits and Systems I*, vol. 46, pp. 1064–71, 1999.
- [40] G. Cauwenberghs and J. Waskiewicz, "Focal-plane analog VLSI cellular implementation of the boundary contour system," *IEEE Transactions on Circuits and Systems I*, vol. 46, no. 2, pp. 1064–71, 1999.
- [41] B. E. Shi, "A low-power orientation-selective vision sensor," *IEEE Transaction on Circuits and Systems II*, vol. 47, no. 5, pp. 435–40, 2000.

- [42] P. Venier, A. Mortara, X. Arreguit, and E. A. Vittoz, "An integrated cortical layer for orientation enhancement," *IEEE Journal of Solid–State Circuits*, vol. 32, no. 2, pp. 177–86, 1997.
- [43] S.-C. Liu, J. Kramer, G. Indiveri, T. Delbruck, T. Burg, and R. Douglas, "Orientation-selective aVLSI spiking neurons," *Neural Networks*, vol. 14, no. 6/7, pp. 629–643, 2001.
- [44] T. Y. W. Choi, P. A. Merolla, J. V. Arthur, K. A. Boahen, and B. E. Shi, "Neuromorphic implementation of orientation hypercolumns," *IEEE Transactions on Circuits and Systems I*, vol. 52, no. 6, pp. 1049–60, 2005.
- [45] K. Shimonomura and T. Yagi, "An orientation-selective multi-chip aVLSI applicable to texture analysis," in *International Joint Conference on Neural Networks*. IEEE, 2005, pp. 3267–3271.
- [46] U. Mallik, R. J. Vogelstein, E. Culurciello, R. Etienne-Cummings, and G. Cauwenberghs, "A real-time spike-domain sensory information processing system," in *Proceedings of IEEE International Symposium on Circuits and Systems*, vol. 3, 2005, pp. 1919–1922.
- [47] S. R. Deiss, T. Delbrück, R. J. Douglas, M. Fischer, M. Mahowald, T. Matthews, and A. M. Whatley, "Address-event asynchronous local broadcast protocol," World Wide Web page, 1994, http://www.ini.unizh.ch/~amw/scx/aeprotocol.html.
- [48] A. M. Whatley, PCI-AER board Driver, Library & Documentation, Institute of Neuroinformatics, http://www.ini.unizh.ch/~amw/pciaer/.
- [49] PCI Products Data Book, SECTION 2: S5920 PCI Target Interface, AMCC, Applied Micro Circuits Corporation, 6290 Sequence Drive, San Diego, CA 92121-4358, 1998, http://www.amcc.com.
- [50] P. Lichtsteiner, T. Delbruck, and J. Kramer, "Improved ON/OFF temporaly differentiating address-event imager," in 11th IEEE International Conference on Electronics, Circuits and Systems. IEEE, December 2004, pp. 211–214.
- [51] E. Chicca, G. Indiveri, and R. J. Douglas, "An event based VLSI network of integrate-and-fire neurons," in *Proceedings of IEEE International Symposium on Circuits and Systems*. IEEE, 2004, pp. V-357–V-360.
- [52] M. Oster, "Tuning aVLSI chips with a mouse click," *The Neuromorphic Engineer*, vol. 2, no. 1, p. 9, 2005, http://ine-web.org/research/newsletters/index.html.
- [53] J. Kramer, "An integrated optical transient sensor," IEEE Transactions on Circuits and Systems II, vol. 49, no. 9, pp. 612–628, Sep 2002.
- [54] R. Hahnloser, R. Sarpeshkar, M. Mahowald, R. J. Douglas, and S. Seung, "Digital selection and analog amplification co-exist in an electronic circuit inspired by neocortex," *Nature*, vol. 405, no. 6789, pp. 947–951, 2000.
- [55] E. Chicca, "A neuromorphic VLSI system for modeling spike-based cooperative competitive neural networks," Ph.D. dissertation, ETH Zurich, Zurich, Switzerland, April 2006.
- [56] N. V. Swindale, "Orientation tuning curves: Empirical description and estimation of parameters," *Biological Cybernetics*, vol. 78, pp. 45–56, 1998.
- [57] G. Indiveri, E. Chicca, and R. Douglas, "A VLSI array of low-power spiking neurons and bistable synapses with spike-timing dependent plasticity," *IEEE Transactions on Neural Networks*, vol. 17, no. 1, pp. 211–221, Jan 2006.
- [58] S. Mitra, S. Fusi, and G. Indiveri, "A VLSI spike-driven dynamic synapse which learns," in *Proceedings of IEEE International Symposium on Circuits and Systems*. IEEE, May 2006, pp. 2777–2780.
- [59] C. Bartolozzi and G. Indiveri, "Silicon synaptic homeostasis," in Brain Inspired Cognitive Systems 2006, 2006, submitted.
- [60] E. Chicca, P. Lichtsteiner, T. Delbruck, G. Indiveri, and R. J. Douglas, "Modeling orientation selectivity using a neuromorphic multi-chip system," in *Proceedings of IEEE International Symposium on Circuits and Systems*, 2006, (In Press).
- [61] D. Muir, "Spike toolbox," http://www.ini.unizh.ch/~dylan/spike\_toolbox/, 2005.
- [62] G. Indiveri, "VLSI reconfigurable networks of integrate-and-fire neurons with spike-timing dependent plasticity," *The Neuromorphic Engineer*, vol. 2, no. 1, pp. 4–7, 2005, http://ine-web.org/research/newsletters/index.html.
- [63] C. Bartolozzi and G. Indiveri, "Selective attention implemented with dynamic synapses and integrate-and-fire neurons," *NeuroComputing, special issue on Brain Inspired Cognitive Systems*, 2005, in press.
- [64] M. Oster and S.-C. Liu, "Spiking inputs to a winner-take-all network," in Advances in Neural Information Processing Systems (NIPS), Y. Weiss, B. Schölkopf, and J. Platt, Eds., vol. 18, Neural Information Processing Systems Foundation. Cambridge, MA: MIT Press, Dec 2005, pp. 1051–1058.

September 28, 2006

### LIST OF FIGURES

| 1 | PCI-AER board and header board. (a) PCI-AER PCI board. The devices involved in the implementation of the three major functional blocks (see Fig. 2) are highlighted; monitor sequencer, and mapper FIFOs |    |
|---|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----|
|   | the two FPGAs, and the SRAM used to hold the mapper's look-up table. A 68-way connector is used                                                                                                          |    |
|   | to connect to the header board (see Fig. 1(b)) via a cable. The PCI interface chip controls the interface                                                                                                |    |
|   | to the host PC. (b) PCI-AER header board. The header board connects to the PCI board via a 68-way                                                                                                        |    |
|   | cable. Cable drivers are used to ensure the integrity of the signals passing over this cable. The board                                                                                                  |    |
|   | can draw power either from the PCI board on from an external power supply (The connector and                                                                                                             |    |
|   | associated components for this are highlighted.) The board provides connectors for up to four AFR                                                                                                        |    |
|   | senders and four AER receivers plus a dedicated connector from the sequencer. The header heard has                                                                                                       |    |
|   | five status LEDs: a 'nower on' LED, three LEDs which indicate whether each of the three functional                                                                                                       |    |
|   | blocks (monitor, sequencer and mapper) are enabled, and three EIEO full indicator I EDs (one each                                                                                                        |    |
|   | for the monitor, sequencer and mapper) are chaoled, and three THO full indicator LEDS (one cach                                                                                                          | 10 |
| r | Block diagram of the DCI AED interface beard showing its three major functional blocks <i>i.e.</i> the                                                                                                   | 10 |
| 2 | MONITOP the SEQUENCEP and the manner (divided into MADDED IN and MADDED OUT) These                                                                                                                       |    |
|   | (and other blocks) are implemented in two EPGAs. Also shown are the EIEOs, the interface from the                                                                                                        |    |
|   | PCI bus to the local bus provided by an AMCC \$5020 chin [40] the \$P AM used to hold the mapper's                                                                                                       |    |
|   | $1  Crows to the local bus provided by an Aire Cost 20 cmp [\pm2], the SKAW used to hold the mapper s$                                                                                                   | 10 |
| 3 | AFR orientation selectivity system setup. The PCLAFR hoard routes output events of the TMPDIFF                                                                                                           | 1) |
| 5 | chip in response to visual stimuli to the IFWTA chip and monitors the activity of both chips. The PC                                                                                                     |    |
|   | controls the LCD screen for stimulus presentation, the PCI-AER board and the DAC board                                                                                                                   | 20 |
| 4 | (a) Chip architecture. Squares represent excitatory (E) and inhibitory (I) synapses, small unlabeled                                                                                                     | 20 |
| • | trapezoids represent I&F neurons. The I&F neurons can transmit their spikes off-chip and/or to                                                                                                           |    |
|   | locally connected synapses (see text for details). (b) Schematic representation of the connectivity                                                                                                      |    |
|   | pattern implemented by the internal hardwired connections (closed boundary condition). Empty circles                                                                                                     |    |
|   | represent excitatory neurons and the filled circle represents the global inhibitory neuron. Solid/dashed                                                                                                 |    |
|   | lines represent excitatory/inhibitory connections. Connections with arrowheads are monodirectional.                                                                                                      |    |
|   | all the others are bidirectional. Only 8 excitatory neurons are shown for simplicity; the actual chip                                                                                                    |    |
|   | contains 31 excitatory neurons.                                                                                                                                                                          | 21 |
| 5 | Sketch representative of the mapping from the TMPDIFF chip to the IFWTA chip. The TMPDIFF                                                                                                                |    |
|   | retina is represented by a twelve by twelve array of pixels, lines represents excitatory connections from                                                                                                |    |
|   | the TMPDIFF chip to neurons of the IFWTA chip (represented by circles)                                                                                                                                   | 22 |
| 6 | Tuning curves for the feed-forward (dashed line) and the feed-back (solid line) model of orientation                                                                                                     |    |
|   | selectivity. (a) The mean frequency (Hz) of each neuron is plotted as a function of stimulus orientation                                                                                                 |    |
|   | (the scales are the same for all plots and can be seen in (b)). The top left graph shows the activity of                                                                                                 |    |
|   | the inhibitory neuron, the other graphs show the activity of the excitatory neurons (a bar representing                                                                                                  |    |
|   | the retinal pixels mapped to the neuron, <i>i.e.</i> its preferred orientation, is shown in each plot). The                                                                                              |    |
|   | tuning curves of the feed-forward model have a larger amplitude and a smaller half-width at half-                                                                                                        |    |
|   | height compared to the tuning curves of the feed-back model. (b) Tuning curves for the feed-forward                                                                                                      |    |
|   | (dashed line and filled circles) and the feed-back (solid line and empty circles) model of orientation                                                                                                   |    |
|   | selectivity for the neuron with vertical preferred orientation (enlargement of the second from left panel                                                                                                |    |
|   | in the first row of (a)). The lines represent the von Mises functions fitted to the data, represented by                                                                                                 |    |
|   | circles and error bars (standard deviation over the measured mean frequency.)                                                                                                                            | 23 |
| 7 | (a) Population data for the amplitude of the tuning curve at the preferred orientation (feed-back versus                                                                                                 |    |
|   | feed-forward model). (b) Population data for the half-width at half-height of the tuning curve (feed-back                                                                                                |    |
|   | versus feed-forward model).                                                                                                                                                                              | 24 |

8 Raster plot of the response of the TMPDIFF pixels and IFWTA neurons to a vertical bar. The graphs in the top row show the ON (black dots) and OFF (grey dots) response of the TMPDIFF chip (top) and the response of the IFWTA chip (bottom) to two cycles of the flashing stimulus. The graphs in the bottom row show a magnified versions of one of the bursts of the top row graphs. In the left column the IFWTA is configured to implement the purely feed-forward model. In the right column the IFWTA chip implements the recurrent network described in the text. This graphs show how the orientation selectivity system produces bursts of activity in response to the appearing/disappearing of the flashing bars and no activity when the stimulus is static. As shown by the two graphs in the second row, the response delays of the feed-forward network and feed-back network are comparable.



Fig. 1. PCI-AER board and header board. (a) PCI-AER PCI board. The devices involved in the implementation of the three major functional blocks (see Fig. 2) are highlighted: monitor, sequencer, and mapper FIFOs, the two FPGAs, and the SRAM used to hold the mapper's look-up table. A 68-way connector is used to connect to the header board (see Fig. 1(b)) via a cable. The PCI interface chip controls the interface to the host PC. (b) PCI-AER header board. The header board connects to the PCI board via a 68-way cable. Cable drivers are used to ensure the integrity of the signals passing over this cable. The board can draw power either from the PCI board on from an external power supply. (The connector and associated components for this are highlighted.) The board provides connectors for up to four AER senders and four AER receivers plus a dedicated connector from the sequencer. The header board has five status LEDs: a 'power on' LED, three LEDs which indicate whether each of the three functional blocks (monitor, sequencer and mapper) are enabled, and three FIFO full indicator LEDs (one each for the monitor, sequencer and mapper FIFOs).



Fig. 2. Block diagram of the PCI-AER interface board showing its three major functional blocks, *i.e.* the MONITOR, the SEQUENCER, and the mapper (divided into MAPPER-IN and MAPPER-OUT). These (and other blocks) are implemented in two FPGAs. Also shown are the FIFOs, the interface from the PCI bus to the local bus provided by an AMCC S5920 chip [49], the SRAM used to hold the mapper's look-up table, and the interconnecting buses.



Fig. 3. AER orientation selectivity system setup. The PCI-AER board routes output events of the TMPDIFF chip in response to visual stimuli to the IFWTA chip and monitors the activity of both chips. The PC controls the LCD screen for stimulus presentation, the PCI-AER board and the DAC board.



Fig. 4. (a) Chip architecture. Squares represent excitatory (E) and inhibitory (I) synapses, small unlabeled trapezoids represent I&F neurons. The I&F neurons can transmit their spikes off-chip and/or to locally connected synapses (see text for details). (b) Schematic representation of the connectivity pattern implemented by the internal hardwired connections (closed boundary condition). Empty circles represent excitatory neurons and the filled circle represents the global inhibitory neuron. Solid/dashed lines represent excitatory/inhibitory connections. Connections with arrowheads are monodirectional, all the others are bidirectional. Only 8 excitatory neurons are shown for simplicity; the actual chip contains 31 excitatory neurons.



to horizontal

Fig. 5. Sketch representative of the mapping from the TMPDIFF chip to the IFWTA chip. The TMPDIFF retina is represented by a twelve by twelve array of pixels, lines represents excitatory connections from the TMPDIFF chip to neurons of the IFWTA chip (represented by circles).



Fig. 6. Tuning curves for the feed-forward (dashed line) and the feed-back (solid line) model of orientation selectivity. (a) The mean frequency (Hz) of each neuron is plotted as a function of stimulus orientation (the scales are the same for all plots and can be seen in (b)). The top left graph shows the activity of the inhibitory neuron, the other graphs show the activity of the excitatory neurons (a bar representing the retinal pixels mapped to the neuron, *i.e.* its preferred orientation, is shown in each plot). The tuning curves of the feed-forward model have a larger amplitude and a smaller half-width at half-height compared to the tuning curves of the feed-back model. (b) Tuning curves for the feed-forward (dashed line and filled circles) and the feed-back (solid line and empty circles) model of orientation selectivity for the neuron with vertical preferred orientation (enlargement of the second from left panel in the first row of (a)). The lines represent the von Mises functions fitted to the data, represented by circles and error bars (standard deviation over the measured mean frequency.)



Fig. 7. (a) Population data for the amplitude of the tuning curve at the preferred orientation (feed-back versus feed-forward model). (b) Population data for the half-width at half-height of the tuning curve (feed-back versus feed-forward model).



Fig. 8. Raster plot of the response of the TMPDIFF pixels and IFWTA neurons to a vertical bar. The graphs in the top row show the ON (black dots) and OFF (grey dots) response of the TMPDIFF chip (top) and the response of the IFWTA chip (bottom) to two cycles of the flashing stimulus. The graphs in the bottom row show a magnified versions of one of the bursts of the top row graphs. In the left column the IFWTA is configured to implement the purely feed-forward model. In the right column the IFWTA chip implements the recurrent network described in the text. This graphs show how the orientation selectivity system produces bursts of activity in response to the appearing/disappearing of the flashing bars and no activity when the stimulus is static. As shown by the two graphs in the second row, the response delays of the feed-forward network and feed-back network are comparable.

### LIST OF TABLES

| Ι | PCI-AER board and driver performance data.                                                           | 27 |
|---|------------------------------------------------------------------------------------------------------|----|
| Π | Parameters obtained by least-squares fitting of the data to the von Mises distribution. The mean and |    |
|   | standard deviation over the population of 31 orientation selective neurons                           | 28 |

### TABLE I PCI-AER BOARD AND DRIVER PERFORMANCE DATA.

| Monitor max sustainable rate (without FIFO overruns)    | $\approx 420 \ k \ events \ s^{-1}$ |
|---------------------------------------------------------|-------------------------------------|
| Sequencer max sustainable rate (without FIFO underruns) | 1.0 M events $s^{-1}$               |
| Monitor min AE cycle time                               | < 280 ns                            |
| Sequencer max instantaneous rate                        | $\approx 1.1 \ M \ events \ s^{-1}$ |
| Sequencer min AE cycle time                             | $\approx 240 \ ns$                  |

 TABLE II

 PARAMETERS OBTAINED BY LEAST-SQUARES FITTING OF THE DATA TO THE VON MISES DISTRIBUTION. THE MEAN AND STANDARD

 DEVIATION OVER THE POPULATION OF 31 ORIENTATION SELECTIVE NEURONS.

|                                 | Feed-forward Model |           | Feed-back Model |           |
|---------------------------------|--------------------|-----------|-----------------|-----------|
|                                 | Mean               | Standard  | Mean            | Standard  |
|                                 |                    | deviation |                 | deviation |
| Amplitude (Hz)                  | 10                 | 2         | 19              | 4         |
| $\theta_{0.5}$ (°)              | 21                 | 2         | 19              | 2         |
| Baseline activity (Hz)          | 1.7                | 0.6       | 0.07            | 0.11      |
| Preferred orientation error (°) | 3                  | 2         | 3               | 2         |