Chapter 2

Chapter 2 The Embryonics Project

The objective of the work presented in this thesis is the development of a programmable circuit (FPGA) which will integrate some properties which are more commonly associated with biological organisms: self-repair (healing) and self-replication (cellular division). We try to achieve this goal by applying bio-inspired mechanisms to digital electronics, an approach which provides a set of useful guidelines for the development of our system, but also introduces some peculiar constraints which are not commonly encountered in traditional circuit design.

In order to understand some of the choices we made in our design, it is therefore necessary to consider the biological inspiration of our project. This section aims at providing the required background by presenting an overview of the Embryonics project and of biological inspiration (section 2.1) and then narrowing the focus on the more specific features of bio-inspired hardware (section 2.2).

2.1 Biological Inspiration in Embryonics

In this section we want to provide an overview of the approach we adopted in the Embryonics project in order to adapt biological concepts and mechanisms to the world of electronics. After introducing the project and its main goals, we will present our approach to biological inspiration, and notably the model we created to try and unify bio-inspired systems into a common framework.

2.1.1 Overview of the Project

Embryonics [60, 62, 89, 94] is a long-term research project, conceived not so much to achieve a specific goal, but rather to look for insights by applying new concepts to a known field. In our case, we try to obtain interesting results by applying biological concepts (i.e., concepts which are usually associated with biological processes) to computing, and notably to the design of digital hardware.

The analogy between biology and electronics is not as farfetched as it might appear at a first glance. Aside from the more immediate parallel between the human brain and the computer, which has led to the development of fields such as artificial intelligence and is the inspiration for many applications (such as, for example, pattern recognition), a certain degree of similarity exists between the genome(the hereditary information of an organism) and a computer program.

The genome consists of a uni-dimensional string of data encoded in a base-4 system. The DNA (Deoxyribonucleic Acid), the macromolecule in which the genome is encoded, is a sequence of four bases: A (Adenine), C (Cytosine), G (Guanine), and T (Thymine). The information stored on the DNA is chemically decoded and interpreted to determine the function of a cell. A computer program is a uni-dimensional string of data encoded in a base-2 system (0 and 1). Stored in an electronic memory circuit, it is interpreted to determine the function of a processor.

Of course, carbon-based biology and silicon-based computing are different enough that no straightforward one-to-one relationship between the genome and a computer program (and indeed between any biological and computing process) can be established, except at a very superficial level. However, through careful interpretation, some basic biological concepts can be adapted to the design of computer systems, and some biological processes are indeed extremely interesting from a computer designer's point of view: for example, an organism's robustness (achieved by its healing processes) is unequaled in electronic circuits, while the natural process of evolution has produced organisms of a complexity which far exceeds that of modern computer systems.

It should be obvious that, biology being a very vast field, the biological concepts which could potentially be adapted to computer hardware are numerous, as are the approaches to applying such concepts to electronics. The first step in such an effort should therefore be to identify which aspects of the biological world will be studied, and define a unified framework which can meld them into a single project.

2.1.2 Bio-Inspired Systems and the POE Model

In recent years, the barrier dividing the worlds of biology and electronics has considerably thinned. More and more, electronics are exploited in biology and medicine, while biological ideas are starting to cross over into computer design.

Even if the first direction is, so far, definitely dominant (computers are quickly becoming indispensable tools for medical and biological applications ranging from computerized axial tomography to genomic sequence analysis), biology is starting to inspire engineers to develop quasi-biological systems. Among the best-known examples of such systems, we can mention:

Expert systems[44], very complex software programs which try to imitate the high-level processes of human intelligence through the use of dedicated algorithms. A practical application of artificial intelligence, they have produced remarkable results in a variety of fields.
Neural networks[9, 36], arrays of small processing elements directly inspired by the human nervous system (more below), capable of "learning", i.e. of modifying their own structure in response to a set of input patterns, somewhat analogously to the way synapses are modified by external (sensorial) stimuli.
Genetic algorithms[65] and evolutionary computation[64], which exploit some of the mechanisms of natural evolution (such as mutations and sexual reproduction) in the hope of evolving a solution for computational problems which are either too complex or too ill-defined to be solved with conventional approaches.
Computer viruses[91], tiny (but extremely crafty) computer programs which attempt to invade and sometimes destroy a computer system using strategies very similar to those used by biological viruses to invade and sometimes kill an organism [46].

Obviously, biological inspiration in computer science can assume a variety of different forms, and Embryonics does not attempt to cover all its possible aspects. Nevertheless, the scope of the project is such that it does involve many of the traditional bio-inspired systems, as well as the development of novel approaches. To illustrate the scope of the project, we have created the POE model [84, 90], an attempt to classify bio-inspired systems, according to their biological source of inspiration, along three axes: phylogeny, ontogeny, and epigenesis (Fig. 2-1).

2.1.3 The POE Model: Phylogeny

Phylogeny, n.: The racial history or evolutionary development of any plant or animal species [107].

On the phylogenetic axis we find systems inspired by the processes involved in the evolution of a species through time, i.e. the evolution of the genome. The process of evolution is based on alterations to the genetic information of a species, occurring through two basic mechanisms: crossover and mutation.

Crossover (or recombination) is directly related to the process of sexual reproduction. When two organisms of the same species reproduce, the offspring contains genetic material coming from both parents, and thus becomes a unique individual, different from either parent. Mutation consists of random alterations to the genome caused either by external phenomena (i.e., radiation or cosmic rays) or by chemical faults which occur when the two genomes merge. Often resulting in non-viable offspring², mutations are nevertheless vital for the process of evolution, as they allow "leaps" in evolution which would be impossible to achieve by merging the genomes of individuals of the same species.

Both these mechanisms are, by nature, non-deterministic. This represents both their strength and their weakness, when applied to the world of electronics. It is a strength, because they are fundamentally different from traditional algorithms and thus are potentially capable of solving problem which are intractable by deterministic approaches. It is a weakness, because computers are inherently deterministic (it is very difficult, for example, to generate a truly random number, a basic requirement for non-deterministic computation, in a computer).

Even with this disadvantage, algorithms which exploit phylogenetic mechanisms are carving themselves a niche in the world of computing. These algorithms, commonly referred to as evolutionary algorithms(Fig. 2-2) (a label which regroups domains such as genetic algorithms [65], evolutionary programming [28], and genetic programming [48, 49]), are usually applied to problems which are either too ill-defined or intractable by deterministic approaches, and whose solutions can be represented as a finite string of symbols (which thus becomes the equivalent of the biological genome). An initial, random population of individuals (i.e., of genomes), each representing a possible solution to the problem, is iteratively "evolved" through the application of mutation (i.e., random alterations of a sequence) and crossover (i.e., random mergings of two sequences). The resulting sequences are then evaluated on the basis of their efficiency in solving the given problem (the fitness function) and the best (fittest) solutions are in turn evolved. This approach is not guaranteed to find the best possible solution to a given problem, but can often find an "acceptable" solution more efficiently than deterministic approaches. Our laboratory is very active in this domain [86, 87, 88].

Another, less conventional but very interesting example of phylogenetic computer system is Tierra [81], an experimental project in which a "population" of programs is allowed to freely evolve inside a computer's memory. The very simple fitness criterion is survival in an environment with limited resources (in this case, the limited resource is the amount of memory available). Some very interesting and unexpected survival strategies emerged spontaneously, including parasitism (where tiny programs, which would not be able to survive by themselves, exploit the code belonging to other programs) and the evolution more and more compact programs (smaller programs are more likely to survive in an environment where memory is the limited resource).

It appears, then, that the phylogenetic axis has already provided a considerable amount of inspiration to the development of computer systems. To date, however, its impact has been felt mostly in the development of software algorithms, and only marginally in the conception of digital hardware. Koza et al. pioneered the attempt to apply evolutionary strategies to the synthesis of electronic circuits when they applied genetic algorithms to the evolution of a three-variable multiplexer and of a two-bit adder [50]. Also, evolutionary strategies have been applied to the development of the control circuits for autonomous robots [26, 27]. Unfortunately, technical issues have posed severe obstacles to the progress of this kind of approach, and few groups are currently active in this domain [37, 38, 100]. Our laboratory contributed to the research with the development of Firefly [33], a machine capable of evolving a solution to the problem of synchronizing a uni-dimensional cellular automaton (see the next chapter for a more detailed description of cellular automata). This relatively simple task represents nevertheless the first example of hardware capable of evolving completely on-line, i.e. without the assistance of a computer.

2.1.4 The POE Model: Epigenesis

Epigenesis, n.: The theory that the germ cell is structureless and that the embryo develops as a new creation through the action of the environment on the protoplasm [107].

The human genome contains approximately 3x10⁹bases. An adult human body contains something like 6x10¹³cells, of which approximately 10¹⁰ are neurons, with 10¹⁴ connections. Obviously, the genome cannot contain enough information to completely describe all the cells and synaptic connections of an adult organism [21].

There must therefore exist a process which allows the organism to increase in complexity as it develops. Since we already know that the additional information cannot come from within the cell itself, the only possible source of additional information is the outside world: the growth and development of an organism must be influenced by the environment, a process which is most evident in the development of the nervous, immune, and endocrine systems. Following A. Danchin's terminology [21, 22], we have labeled this process epigenesis³.

Epigenetic mechanisms have already had considerable impact on computer science, and particularly on software design, notably through the concept of learning. The parallel between a computer and a human brain dates to the very earliest days of the development of computing machines, and led to the development of the field known as artificial intelligence[108].

AI, which probably knew its greatest popularity in the seventies, tried to imitate the high-level processes of the human mind using heuristic approaches. After a number of years in the spotlight, this field seems today to have lost momentum, but not without leading to useful applications (such as expert systems) and major contributions to the development of novel algorithms (for example, IBM's well-known Deep Blue computer exploits algorithms largely derived from AI research).

Hardware systems based on epigenetic processes are starting to emerge, in the form of artificial neural networks (ANNs) [9, 36], two-dimensional arrays of processing elements (the neural cells) interconnected in a relatively complex pattern (Fig. 2-3). Each cell receives a set of input signals from its neighbors, processes them according to a given, implementation-dependent function, and propagates the results through the network. This process is a good approximation of the mechanisms exploited by biological neurons (for example, as in nature, some inputs signals have a greater impact, or weight, in the computation of a neuron's output value).

The first phase in the operation of this kind of circuit is the learning phase: a set of inputs is applied to the system, the outputs are compared to a set of correct responses, and a feedback process modifies the weight of the connections and the function of the processing elements until the network has learned to "behave" correctly (i.e., to produce the correct output for most, is not all, input patterns). The feedback process is hidden from the user, who therefore cannot directly modify the weights or the functions.

ANNs cannot be expected to perform correctly (i.e. to produce the correct output) for allinput patterns, a drawback which prevents their use in many applications. However, they have proved their worth (to the point that they are starting to be adopted in commercial systems) in applications such as voice and character recognition, where a limited margin of error is acceptable.

Our laboratory is also involved in research along the epigenetic axis, with the development of FAST (Flexible Adaptable-Size Topology) [74, 75, 76], a neural network with a dynamically reconfigurable structure. Traditional ANNs have a fixed interconnection structure, and only the weights associated with the connections can be modified. By implementing the network using reconfigurable logic (see subsection 2.2.2 ), FAST achieves on-line learning capabilities and can exploit an adaptable topology (two features which are essential to obtain true learning systems, as opposed to "learned" ones). While FAST is not the first neural network based on an adaptable topology [30, 67], it is unique in that it does not require intensive computation to reconfigure its structure, and can thus exist as a stand-alone machine which could be used, for example, in real-time control applications.

2.1.5 The POE Model: Ontogeny

Ontogeny, n.: The life cycle of a single organism; biological development of the individual [107].

The phylogenetic and epigenetic axes of the POE model cover the great majority of existing bio-inspired systems. The development of a multi-cellular biological organism, however, involves a set of processes which do not belong to either of these two axes. These processes correspond to the growth of the organism, i.e. to the development of an organism from a mother cell (the zygote) to the adult phase. The zygote divides, each offspring containing a copy of the genome ( cellular division ). This process continues (each new cell divides, creating new offspring, and so on), and each newly formed cell acquires a functionality (i.e., liver cell, epidermal cell, etc.) depending on its surroundings, i.e., its position in relation to its neighbors ( cellular differentiation ).

Cellular division is therefore a key mechanism in the growth of multi-cellular organisms, impressive examples of massively parallel systems: the 6x10 13cells of a human body, each a relatively simple elements, work in parallel to accomplish extremely complex tasks (the most outstanding being, of course, intelligence). If we consider the difficulty of programming parallel computers (a difficulty which has led to a decline in the popularity of such systems), biological inspiration could provide some relevant insights on how to handle massive parallelism in silicon.

A fundamental feature of biological organisms is that each cell contains the blueprint for the entire organism (the genome), and thus can potentially assume the functionality of any other cell: no single cell is indispensable to the organism as a whole. In fact, cells are ceaselessly being created and destroyed in an organism, a mechanism at the base of one of the most interesting properties of multi-cellular organisms: healing.

The mechanisms of ontogeny (cellular division and cellular differentiation), unlike those of epigenesis and phylogeny, are completely deterministic, and are thus, in theory, more easily adaptable to the world of digital circuits (which is by nature deterministic). In spite of this, the ontogenetic axis has been almost completely ignored by computer scientists, despite a promising start in the fifties with the work of John von Neumann, who developed a theoretical model of a u niversal constructor, a machine capable of constructing any other machine, given its description [104]. Given a description of itself, the universal constructor can then self-replicate, a process somewhat analogous to cellular division.

We will describe von Neumann's machine, one of the fundamental sources of inspiration for the Embryonics project in general and for this work in particular, in detail in the next chapter, but we should mention that, unfortunately, electronic circuits in the 1950s were too primitive to allow von Neumann's machine to be realized, and the concept of self-replicating machines was thus set aside.

Probably the main obstacle to the development of self-replicating machines was the impossibility of physically creating self-replicating hardware. In fact, such machines require a means to transforming information (i.e. the description of a machine) into hardware, and such a means was definitely unpractical until recently. As we will see below, the introduction of programmable logic circuits, by demonstrating the feasibility of such a process, was an important step towards the development of self-replicating machines. One of the main goals of the Embryonics project is to determine if, given modern technology, Von Neumann's dream of a self-replicating machine can be realized in hardware. The work presented in this thesis represents a major step in this direction: by creating a hardware system capable of self-replication, we open the way to the practical realization of von Neumann's universal constructor.

2.2 Bio-Inspired Hardware

For a project such as Embryonics, the transition from theory to implementation is far from obvious: for every biological concept we wish to translate into hardware, we must find an equivalent concept in the world on silicon⁴. This section describes our efforts to define a parallel between the cellular structure of biological organisms and the design of computing hardware. Subsection 2.2.1 will provide a basic overview of our approach, and will be followed (subsection 2.2.2) by a short introduction to FPGAs, the programmable circuits at the heart of our system. The following subsections will then describe the three-level approach (organism, cell, molecule) we followed in the design of our bio-inspired hardware.

2.2.1 Ontogenetic Hardware

Since the ontogenetic axis, the most closely concerned by this work, is essentially based on the concept of "cell", the first step in developing ontogenetic hardware is therefore to define the electronic equivalent of a biological cell. Obviously, there exist a number of possible approaches to establish such a parallel, but if we are to maintain the analogy with biology, our electronic cell must respect the fundamental biological definitions [110]:

Cell, n.: the basic structural and functional unit of all organisms; cells may exist as independent units of life (as in monads) or may form colonies or tissues as in higher plants and animals.

Nucleus, n.: a part of the cell containing DNA and RNA and responsible for growth and reproduction.

DNA, n.: a nucleic acid consisting of large molecules shaped like a double helix; associated with the transmission of genetic information.

RNA, n.: a nucleic acid that transmits genetic information from DNA to the cytoplasm; controls certain chemical processes in the cell.

Genome, n: one haploid set of chromosomes with the genes they contain; the full DNA sequence of an organism.

In other terms, the cell is the smallest structural unit of an organism which contains the description of the entire organism, i.e., its genome, which is interpreted to control the cell's functions. One approach which allows us to respect these definitions is to implement computing systems (the artificial organisms) using an array of (relatively) small processing elements (the artificial cells), each executing the same program (the artificial genome). As we will see, this approach allows us not only to respect the fundamental definitions of a biological cell, but also to exploit some of the more specialized mechanisms on which the ontogenetic development of a cell is based.

Many of the precise mechanisms whereby the genome is accessed and interpreted inside each cell are still unknown to science⁵. Since our intention is notto imitate biology, however, such detailed knowledge is superfluous: the transition from the carbon-based world of biology to the silicon-based world of electronic circuits will necessarily generate considerable discrepancies. Undoubtedly the greatest such discrepancy is a consequence of the process of biological ontogeny itself, which is based on biology's capability to handle physical material: biological ontogenetic development is based on the physicalgrowth and replication of cells, and biological self-repair is a consequence of their physicalcreation and destruction. The state of the art in the design of electronic circuits does not allow us to act on the physical structure of our cells (i.e., the silicon) after fabrication.

This obstacle, which was a major factor in preventing the realization of von Neumann's universal constructor, would have been almost insurmountable until a few years ago, when the first FPGA circuits were developed.

2.2.2 Field-Programmable Gate Arrays

An FPGA (Field-Programmable Gate Array) [14, 103] is basically a chip that can be configured (i.e., programmed via software) to realize any given function (that is, to implement any digital logic circuit⁶). They are two-dimensional arrays of logic elements⁷and, while the exact structure of an FPGA element can vary considerably from one manufacturer to the next, certain essential features are constant (Fig. 2-4):

Each element can implement a programmable function, usually consisting of combinational logic plus one or more memory elements (flip-flops) for sequential behavior. The complexity and the structure of the programmable function can vary considerably from one type of FPGA to the next (for example, the combinational part can be implemented using a lookup table, as in the Xilinx XC4000 [111] and XC5200 [112] families, or be hard-wired, as in the Xilinx XC6200 [113] family).
Communication between the elements is handled through programmable connections, again of varying complexity depending on the type of FPGA. Experience has shown that connections (rather than functionality) are the main bottleneck for the layout of FPGA circuits, an observation which has led most designers to add long-distance connections distributed inhomogenously throughout the array. As we will see below, this lack of homogeneity is one of the major factors in our decision to design a novel FPGA for the Embryonics project.
The functionality and the connections of an element are controlled by its configuration. The configuration bitstream (that is, the sum of all the configurations of the FPGA's elements) determines the global behavior of the chip. An FPGA can be configured to implement any digital logic circuit (provided enough elements are available or the circuit can be subdivided among different chips) and in most cases is reprogrammable (that is, its configuration can be erased and replaced by a new one, implementing a different circuit).

FPGAs are quickly becoming an essential tool for the design of complex computer circuits. In the industry, they are mainly used for rapid prototyping : manufacturing a VLSI circuit is an expensive and time-consuming endeavor, and a circuit has to be tested very extensively before it is sent to a foundry for fabrication. Most design errors can be detected through software simulation, but the speed (or lack thereof) of such simulations does not allow extensive testing of complex circuits. By using FPGAs to implement a design, the circuit can be tested at hardware speeds, and the debugging can therefore be much faster and more complete (moreover, some faults can only be observed when the circuit is tested at speed).

Obviously, the remarkable versatility of FPGAs comes at a price: speed. The programmable function of each element, being universal (i.e., capable of implementing a number of different functions), is necessarily much slower than a dedicated implementation, and the logic required to make connections programmable inevitably slows the flow of data. As a consequence, circuits implemented using FPGAs can rarely operate at very high clock rates. This limitation, while still allowing FPGAs to abundantly outperform any kind of software simulation for prototyping, is often too restrictive for FPGAs to be used in actual applications.

However, in some cases the versatility of FPGAs can overcome this shortcoming, either because the circuit is not speed-critical, but could benefit from regular upgrades or even dynamic alterations, or because the advantage of having dedicated (often parallel) processors can easily compensate the slower clock rate (for example, specific mathematical operations which would require many clock cycles in a general-purpose processor, but could be executed in a single, if slower, clock cycle in a dedicated processor). Given the prohibitive cost of developing a dedicated computer system in VLSI, using programmable logic can be a viable and cost-effective option in many applications.

Our laboratory has been involved in this kind of FPGA-based systems for many years. Among the most interesting project, we can mention:

Spyder [42, 43], a VLIW (Very Long Instruction Word) processor with programmable functional units. By dynamically adapting the structure of the functional units, Spyder is able to create what amounts to an instruction-specific processor.
GENSTORM [68], a dedicated processor for matching DNA or protein sequences with known pattern families. By implementing dedicated algorithms in hardware, GENSTORM can overcome the slower clock rate in a very computationally-intensive application.
RENCO [15, 34], a reprogrammable network computer (NC): while conventional NCs can charge an application through the network to execute it locally on a general-purpose processor, RENCO can charge the hardware configuration itself, so as to provide a dedicated processor for each application.

Since the Embryonics project is not meant to produce a commercially viable product⁸, speed of operation is not one of our main priorities. On the other hand, the reprogrammability of FPGAs is a solution to the problem of implementing an ontogenetic machine, as it provides a way to modify the hardware structure of a system by altering information, sidestepping the need to handle physical matter.

2.2.3 The Artificial Organism

By demonstrating that it is possible to modify hardware using information, the development of FPGA circuits has proved the feasibility of creating computer hardware inspired by biological ontogeny. To develop an approach to the design of such systems, we analyzed the essential features of biological organisms.

In biology, an organism is a three-dimensional array of cells, all performing their functions in parallel to give rise to global processes (i.e., processes involving the entire organism). Each cell determines its function on the basis of its position within the array (cellular differentiation), during both growth and healing. To respect the biological analogy, our electronic organism will then consist of a two-dimensional array of elements⁹working in parallel to achieve a global task, i.e. a given application.
In biology, each cell contains the entire genome, that is, the function of every cell in the organism. If we are to maintain the analogy between the genome and a computer program, this feature implies that we must regard the elements of our electronic organism as processors, each containing the same program. No single cell uses the entire genome, accessing only those portions necessary to perform its functions. Similarly, no single processor will execute all the instructions in its program, but will use its position within the array to identify which subset of the program to access¹⁰.
In biology, not all cells have access to external stimuli. Input from the outside world is limited to a (relatively) small subset of all the cells, and is propagated throughout the organism via messages passed from one cell to another. We can make use of this concept to determine a communication network for our electronic organism: since we do not need all processors to have access to external inputs and outputs, we can limit our network to local connections (for example, a processor's connections can be limited to its cardinal neighbors), thus greatly simplifying the communication network¹¹.

Drawing inspiration from biological organisms has thus led to define our organism as a two-dimensional array of processing elements, all identical in structure (each cell must be able to execute any subset of the genome program) and each executing a different part of the same program, depending on its position. At first glance, this kind of system might not seem very efficient from the standpoint of conventional circuit design: storing a copy of the genome program in each processor is redundant, since each processor will only execute a subset.

However, by accepting the weaknesses of bio-inspiration, we can also partake of its strengths. One of the most interesting features of biological organisms is their robustness, a consequence of the same redundancy which we find wasteful: since each cell stores a copy of the entire genome, it can theoretically replace any other. Thus, if one or more cells should die as a consequence of a trauma (such as a wound), they can be recreated starting from any other cell. By analogy, if one or more of our processors should "die" (as a consequence, for example, of an hardware fault), they can theoretically be replaced by any other processor in the array.

The redundancy introduced by having multiple copies of the same program thus provides an intrinsic support for self-repair, one of the main objectives of our research: by providing a set of spare cells (i.e., cells that are inactive during normal operation, but which are identical to all other cells and contain the same genome program), we are able (Fig. 2-5) to reconfigure the array around one or more faulty processors (of course, as in living beings, too many dead cells will result in the death of the entire organism).

Moreover, if the function of a cell depends on its coordinates, the task of self-replication is greatly simplified: by allowing our coordinates to cycle (Fig. 2-6) we can obtain multiple copies of an organism with a single copy of the program (provided, of course, that enough processors are available). Depending on the application and on the requirements of the user, this feature can be useful either by providing increased performance (multiple organisms processing different data in parallel) or by introducing an additional level of robustness (the outputs of multiple organisms processing the same data can be compared to detect errors).

2.2.4 The Artificial Cell

Keeping in mind the requirements of the organism, we can now determine the basic features of our electronic cell. At the hardware level, all cells must be identical: since we want our organism to be reprogrammable, we cannot fix a priori the functionality of our cell. In addition, it has to be able to store the genome program with a coordinate-dependent access mechanism. The minimal features of our cells must therefore include (Fig. 2-7):

A memory to store the genome. The size of the genome is application-dependent, and therefore we should ideally be able to configure the size of the memory.
An [X,Y]coordinate system, to allow the cell to locate its position within the array, and thus its function (Fig. 2-8). The size of the registers storing the coordinates limits the maximum size of the array. Since there should be no limit to the size of an organism, the size of the coordinate registers should be programmable.
An interpreter to read and execute the genome. The complexity of this interpreter can vary depending on the application, but once a language for the genome program has been defined (see below), we should be able to keep it more or less constant.
A functional unit, for data processing. It can contain a variety of logic elements, from a single register to a full arithmetic unit and beyond, depending on the application. While it might be more efficient to tailor the functional unit to the application, we can however execute any program with a fixed functional unit by shifting the complexity to the genome program.
A set of connections handled by a routing unit. In theory, cardinal connections (where each cell communicates with its neighbors in the four cardinal directions) are sufficient for any application. However, more complex, application-dependent patterns can be considered.

While the structure of the functional unit can vary, biological organisms provide a powerful example of complex behavior derived not from the complexity of the components, but from the parallel operation of many simple elements. One of the goals of our project is to show that biological inspiration allows us, without excessive difficulty, to design complex systems by combining very simple cells.

To illustrate the capabilities of our system, we have realized in hardware a first prototype of such a cell, known as MicTree(for tree of microinstructions) [58, 59, 60, 93], where the functional unit is a 4-bit register, the coordinate registers are 4 bits wide (limiting the maximum size of organism to 16x16 cells, sufficient for demonstration purposes, but restrictive for real applications), and communications, implemented using directional 4-bit busses, are limited to the cardinal neighbors. To write genome programs for MicTree, we have developed a small (but complete) language, complex enough to potentially implement any application, but simple enough that the interpreter need not be large. The instructions of the language are as follows:

```
 if VAR else LABEL
```
```
 goto LABEL
```
```
 do REG = DATA [on MASK]
```
```
 do X = DATA
```
```
 do Y = DATA
```
```
 do VAROUT = VARIN
```
```
 nop
```

where REGis the 4-bit register in the functional unit, Xand Yare the two 4-bit coordinates, MASK lets the bits of REGbe accessed individually, and VAROUT and VARINidentify one of the four possible I/O busses. The instructions are coded in 8-bit words, and the genome memory, of fixed size, can store up to 1024 such words. Each MicTree cell, realized using an Actel A1020B [3] FPGA (ideally suited for prototyping), was then mounted on a custom printed circuit board with a set of 7-segment displays and LEDs (Light Emitting Diodes), which in turn was inserted in an 8X8cm plastic box, the Biodule 601 (Fig. 2-9). These boxes can be fitted together, like a jigsaw puzzle, to form a two-dimensional array of cells and provide a set of neighbor-to-neighbor connections without additional cables.

In order to demonstrate the features of our cellular system, we used our prototype to implement a set of applications:

the BioWatch[60], a timer which keeps count of minutes and seconds, realized with four MicTree cells;
a random number generator[59, 60], a one-dimensional non-uniform cellular automaton, realized with five MicTree cells;
a specialized Turing machine[58, 60, 93], implementing a parenthesis checker(that is, a program which decides whether a sequence of parentheses is well-formed), realized using ten MicTree cells.

These applications, because of the limited size of our prototype, are relatively simple. Nevertheless, they are interesting in that they exhibit both the properties of self-repair (provided spare cells are available) and self-replication (if the array is large enough to contain multiple copies of the organism). In designing applications for our prototype, however, we came to realize that, while MicTree contains all of the minimal features required by our electronic cell, its fixed structure seriously limits the range of applications it can efficiently implement.

One of the most constraining factors is probably the functional unit: a single 4-bit register is too small for complex applications. Of course, the functional unit is not, by itself, constraining, as it could be compensated by using either a larger organism (cells can be very simple, if many are working in parallel) or a longer genome program. However, since both the scope of the coordinates and the size of the genome memory are also fixed, MicTree is unsuited for complex applications.

Therefore, as we suspected, we need to able to reprogram the hardware itself to create a truly ontogenetic machine. The features of the functional unit (e.g., the size of the data path) have to fit the application to reduce the complexity on the rest of the cell. The genome can have very different sizes depending on the complexity of the task (there usually is a trade-off between hardware and software, that is between the complexity of the functional unit and the length of the genome), and the size of the genome memory should therefore be programmable, a feature has an impact on the complexity of the interpreter. The size of the registers containing the coordinates depends on the size of the organism. A particular application might be more efficiently executed by a specific connection pattern. Obviously, we can design a complex enough cell that it will satisfy the greatest majority of possible applications. In fact, with a complex enough interpreter (and thus genome language) or functional part, we can be limited only by size of the genome memory. In practice, however, this solution would be very wasteful: no benefit could be achieved through tailoring processor to application, and the processors would become so complex that we would reintroduce all the drawbacks which have kept conventional multiprocessor parallel systems from gaining widespread acceptance (such as the difficulty of writing parallel code and the complexity of handling communication between processors).

Once again, biology provides a possible solution: the physical structure of a biological cell is determined by chemical processes occurring at the molecular level. Having introduced the concepts of electronic organism and of electronic cell, we now need to define the concept of electronic molecule. Fortunately, we are familiar with a kind of circuit which can be used to implement our molecular layer: FPGAs. Using programmable gate arrays as our molecules allows us to maintain the fundamental analogy with the world of biology: whereas a living cell consists of a three-dimensional array of molecules, a processor consists of a two-dimensional array of logic gates. Further, since in biology the most complex mechanisms of the cellular layer (notably, the genome program and the coordinate system) do not concern the molecular layer, the structure of our electronic molecule is not unlike that of conventional FPGA logic elements. We are therefore confronted with a three-layer system (Fig. 2-10), summarized in Table 2-1.

Table 2-1 Analogies between biological and electronic systems in Embryonics.
Biology	Electronics
Multi-cellular organism	Parallel computer system
Cell	Processor
Molecule	FPGA element

2.2.5 The Artificial Molecule

As we have seen, FPGAs are necessary for a hardware realization of our electronic cell, and they thus assume a fundamental importance for the Embryonics project. In theory, any commercially available FPGA could be used to implement one of our cells. In practice, however, the fundamental features of ontogenetic hardware can not easily implemented using conventional programmable circuits.

The first such feature concerns the basic structure of our artificial organism: a two-dimensional array of identical processing elements (cells). The dimensions of such an array can be very substantial, since a processor, however simple, is nevertheless a relatively complex circuit, and restricting the amount of programmable logic to a single chip can quickly become a serious constraint. Envisaging the use of more than one chip, however, implies the presence of a mechanism allowing designs to be easily distributed among multiple chips. Three main factors inhibit this process in conventional FPGAs:

No currently available routing software (the program which generates a configuration bitstream from the layout of a circuit) is efficient in exploiting the regularity of the array, i.e., the fact that our organism is an array of identical processing elements. This problem could be alleviated by the use of hard macros(i.e., hand-placing the functions and connections on the elements of the FPGA), an extremely time-consuming process which would have to be repeated whenever the structure of the cell changed (i.e., for every new application), and thus is not a practical option. Hopefully, future software will be able to recognize and exploit such regularity, but it is doubtful whether such a feature represents a high priority for FPGA designers.
Partitioning software (the program which divides a large design into sub-circuits to exploit multiple chips) is still at an early stage of development, and is not very efficient. As FPGAs become more and more used for prototyping complex systems, ameliorating the performance of such programs is a relatively high-priority task for developers, but it is proving difficult, mainly because of the non-homogeneous structure of commercial FPGAs.
Commercial FPGAs are not homogeneous. All currently available FPGAs have, at best, a semi-homogenous structure: the need for long-distance connections (indispensable for many kinds of circuits, but not a requirement for systolic arrays) introduces inhomogeneities in the array. This feature, so useful for many applications, is a rather serious inconvenient for our organism, where the size of the cell is application-dependent: the presence of inhomogeneities in the structure of the array imposes restrictions on the size of the cell.

The second feature of ontogenetic hardware we need to implement is healing, i.e., in a terminology more familiar to computer designers, self-repair.

As we mentioned, the cellular level, through its coordinate system, provides an intrinsic capability for self-repair. However, while in biological systems the death of a cell is a commonplace occurrence, in our electronic organism we can not afford the death of many cells: if we assume that a cell will require something in the order of a few hundred molecules, and we assume a finite number of available molecules, we obviously need to minimize the loss of cells. As a consequence, we need a mechanism to repair minor faults at the molecular level, which in turn requires a mechanism which allows self-test, i.e. the detection of faults. In fact, a prerequisite for being able to repair faults, both at the cellular and at the molecular level, is the capability of detecting the occurrence of such faults. Moreover, if the self-repair mechanism is to operate on-line (i.e., while the application is running, rather than in a separate phase), self-test must also be performed on-line.

The design of such a mechanism is a considerable challenge, particularly since the size of the FPGAs elements is such that the additional logic required by the process be minimized. However, the absence of molecular-level self-test would imply that the detection of faults be handled at the cellular level, and since the cells are application-dependent, the self-test logic would have to be included in every new cell, which would complicate their design to a considerable degree.

Providing an on-line self-test and self-repair mechanism at the molecular level is therefore, if not required, at least very desirable, since it would not only increase the robustness of the system (which would "survive" a larger number of faults), but also substantially simplify the design of the cells (by moving much of the additional logic to the application-independent molecular level). Fortunately, the task of designing such a mechanism is somewhat simplified by the presence of a second level of robustness at the cellular level, which implies that the molecular-level self-repair need not be able to handle all faults occurring in the circuit.

To efficiently implement our electronic organism, we would therefore require a completely homogenous self-repairing FPGA which can easily be configured as an array of identical processing elements (self-replication). No commercial FPGA provides the features we need, and it is doubtful whether the demand for self-repair and self-replication is sufficient to justify the additional logic in a commercial circuit, at least in the foreseeable future. As consequence, the efficient implementation of ontogenetic hardware requires the conception of a new FPGA, capable of self-replication and self-repair. The development of such an FPGA constitutes the main challenge of the ontogenetic axis of the Embryonics project, and is the focus of this thesis.

2.In fact, the great majority of mutations have no effect, as they act on unused parts of the genome.

3.This label is by no means universally accepted. We thank Prof. Marcello Barbieri of the University of Ferrara, Italy, for bringing this fact to our attention.

4.Of course, considering the different environments, the analogy will necessarily be limited to the most macroscopic features.

5.It is our secret hope that, in some indirect way, our work might be of some use to biologists by suggesting possible mechanisms for accessing and decooding the biological genome.

6.In fact, some FPGAs can implement analog circuits as well.

7.The elements of an FPGA are usually referred to as cells, but be will avoid the term so as not to engender confusion with our electronic cells.

8.At least, not directly: we do hope that some of the strategies and mechanism developed in our project will eventually, duly adapted, have repercussions on mainstream circuit design.

9.When dealing with the world of electronics circuits, the current state of the art in circuit fabrication limits us to two dimensions, but this limitation (being quantitative, rather than qualitative) is not particularly constraining, and should not have serious consequences for our project.

10.This kind of parallel systems are a particular case of what are commonly known as Single-Program Multiple-Data (SPMD) computer systems [5].

11.Complex organisms can have rather complicated mechanisms which allow long-distance communication between cells to occur, but these can be realized using local connections, with a loss of efficiency compensated by the gain in simplicity.