#### FEAST2.1 irradiation in the IRRAD facility E.Faccio

CERN EP-ESE-ME

September 20, 2018

This report details the result of the two irradiation campaigns that took place at the IRRAD facility in May and July 2018. These were necessary to understand the failures observed in the 2017 run of the CMS pixel detector system. This work was possible thanks to the contribution of a sizeable team of individuals who are acknowledged at the end of this report, whose authorship is hence purely related to the writing of this summary document.

The configuration of the IRRAD facility was modified on purpose for this test. Normally the experiments at this facility use the protons from the PS, and position the samples to be irradiated directly in the proton beam. This is not suitable for our purposes, since the rate of irradiation is excessive and protons arrive in very dense packets. To perform a representative irradiation, a 1cm thick copper foil was inserted on the beam to produce a shower of particles in the surrounding room. The following images illustrate, from simulation, the energy distribution of the resulting particles and their mapped flux in the surroundings of the beam.



IRRAD 10mm Cu Target

Figure 1: Simulation of the energy distribution and flux of the particles produced by the collision of the PS protons with the 1cm CU target introduced for this test.

## 1. First Irradiation Run

### A. Experimental conditions

The first run was conceived when no understanding of the failure had been reached yet, so its primary aim was really to reproduce the failure observed in the CMS pixel detector. Since a test was also planned at PSI, where "high-current" samples had been previously produced (2013) in uncontrolled conditions, a common setup for both tests was developed. This guided the design of the motherboard where the DCDC modules are installed during the experiment, whose layout is illustrated in Figure 2. In PSI the proton beam has a limited energy, hence the motherboard has a hole to reduce the material along the beam path. This is the region where the exposed DCDC converters are positioned, while the 4 converters to the right in the figure are outside the beam. The purpose of the design is to observe if only samples exposed to the PSI beam fail, or if the failure is not directly due to the interaction of the beam with the converters.



Figure 2: Layout of the motherboard for the first irradiation run, originally intended also for PSI.

With respect to the radiation map shown in Fig.1, the dimension of the motherboard is rather large and as a consequence the DCDC modules on each motherboard are exposed to a different flux of particle. This important characteristic was very useful to understand the results of the test.

The full setup to bias, monitor and register the results was custom developed from scratch with the involvement of a number of people from ESE and from CMS. Each of the 8 modules on the motherboard is powered via a dedicated input line, along which a resistor is placed to enable the measurement of the current consumption at any time. Each converter also has a dedicated enable line, so it can be turned on or off separately at wish. All the lines (power and control) linking the motherboard with the equipment in the control room are grouped in a flat cable (DB37 connector), that is connected to a dedicated "breakout board" where the routing to the power supply and to the monitoring and control instrumentation converge. Each motherboard is treated in fact as a separate acquisition unit, and in the first test we used 4 motherboards in identical conditions. Figure 3 illustrates the bias and acquisition system installed on a laboratory cart in the IRRAD control room. The full acquisition system is controlled by a PC running a dedicated Labview program. All data taken during the experiment are registered in files that are centrally stored and can be retrieved at any time, making it possible to follow the results remotely.

During the first irradiation run, 4 fully populated motherboards were positioned in the irradiation area, inside a cold box where they were cooled down to -25C. This was a specificity of this irradiation test with respect to measurements accessible at other facilities: the presence of an already operational cold box where samples could be tested at a temperature similar to the one used in the CMS pixel detector system. Fig.4 shows the 4 motherboards mounted on the cover of the box, and Fig.5 the box inside the experimental area.

The converter modules used for this run were both from the FEASTMP and the Aachen/CMS production lines. These modules differ substantially in their layout and in the choice of passive components, making it essential to use both types in the experiment. Since the connector of the module is different, 2 versions of the motherboard were produced only differing by the mating connector. The choice of exposing half of the modules per each type of design seemed natural,

so 16 FEASTMP and 16 Aachen modules were part of the test - each family in fact made use of 2 dedicated motherboards (this can be seen in Fig.4, where the different module design is clearly observable). Also, samples included parts with different output voltage settings (1.5 and 2.5V) - also the Vout was found not to be discriminant in the CMS failures.



Figure 3: All the instrumentation required for the test is positioned on a laboratory cart, here in the control room of IRRAD. The flat cables are the only connection to the motherboards in the beam, one flat cable per motherboard.



Figure 4: The 4 motherboards are installed on the cold box cover, ready for the exposure.



Figure 5: The closed cold box in its location inside the IRRAD experimental area. The Cu target is visible, as well as the 4 grey flat cables for the acquisition system and bias. The green cables are used for active dosimeters inside the box. Passive dosimeters were installed all around the box, on the cover.

A very important aspect of this irradiation was the condition of the modules during the test. In the CMS pixel detector, all failures were observed after a disable-enable cycle: converters would correctly provide their output voltage until a disable cycle, after which they would not turn on anymore when the enable command was sent. Therefore enable-disable cycles were needed for this irradiation. Moreover, the damage condition known as "high-current" was only detectable by the measurement of the input current at low voltage (below the UVLO threshold for enable). It was therefore important to periodically perform an I-V sweep where the current was measured at low voltage.

The final sequence used in the experiment was therefore composed by the following steps:

- measurement of the I-V curve with the converters disabled between 2.4 and 5.2V
- increase of the Vin to 12V
- sequential enable of all motherboards (4 converters share the same enable line)
- steady condition with periodic monitoring of the voltages and currents (about 1 hour: 6 measurements at 10minutes intervals)
- sequential disable of all motherboards
- monitoring of all voltages and currents
- sequential enable of all motherboards
- steady condition with periodic monitoring of the voltages and currents (about 1 hour: 6 measurements at 10minutes intervals)
- sequential disable of all motherboards
- monitoring of all voltages and currents

Overall, the steady "on" condition with periodic measurements represented 97% of the time and a full sequence lasted about 2 hours. A graphical representation of the sequence is shown in Fig. 6. Unfortunately this sequence was sometimes not fully executed because of a communication problem between the computer and the power supply. The supply would get stuck with an output voltage of 12V until the full acquisition system could be manually reset. This happened a few times, but the influence of this problem is negligible: it only alters the first part of the sequence, when the supply is swept from 2.4 to 5.2V to measure the corresponding input current. This measurement is intended to reveal if the sample has gone into the "high current" damage state,

but this can also be done by comparing the input current of the disabled module at 12V of input voltage. Therefore, the full exploitation of the results is possible even during the frequent occasions when the supply voltage was steady at 12V.

Always concerning the conditions of the samples during the irradiation, two more important points have to be specified. First, the modules were exposed with a very small load of 100 Ohm because this does not require any specific provision for cooling and because the failures in CMS were independent on the load. Second, there was a relevant voltage drop across the long cables and the actual supply voltage at the motherboard was normally around 11-11.2V when the converters were functional. This actual voltage at the end of the line was measured with a dedicated monitoring cable without current flow (remote sensing).



Figure 6: Graphical representation of the sequence applied to the modules during the irradiation. Note how the disables in the middle and at the end of the sequence happened with an input voltage of 12V (nominal, in reality the voltage was closer to 11V during operation because of the drop along the cables).

### B. Experimental results

Irradiation started on May 2 at around 6pm, while detectors inside the box measured -27C and a 10% humidity. During the first hours the on-line monitors were used to estimate the irradiation rate (neutrons/cm2 and TID), and the box was moved increasingly closer to the beam in several steps. Our target was to reach and preferably exceed the radiation levels for the 2017 CMS run in the location where the DCDC converters are installed, namely 2.7e13 n/cm<sup>2</sup> for FPIX and 1.7e13 n/cm<sup>2</sup> for BPIX (these are expressed in 1 MeV equivalent neutrons). When the estimated flux from the active dosimeters was compatible with reaching such a target in the allocated 2-3weeks time, we left the box in that position for the rest of the test.

The irradiation continued for about 2 and a half weeks, and was interrupted on May 21 at midday by removing the Cu target from the beam. The cooling box was left in the same position and at the same temperature for several days, still always running the same acquisition sequence. On June 5 the temperature was increased and the next day the full system was removed from the IRRAD experimental area. Samples were placed in a storage room without bias.

During the irradiation, a large number of samples showed the "high current" type of damage while one sample failed to provide the output voltage. The map of Fig.7 illustrates the position of the damaged converters, their chronological order of damage, the position of the motherboards in the radiation field and the most reasonable dosimetry during the run (details in the caption). The correlation with the proximity of the beam is evident, proving that the damage is indeed associated to the integrated flux of particles: a threshold flux is needed for the damage to be produced. Another fundamental observation concerns the moment in the functional sequence when the damage appeared. Although the DCDCs are in the on state for 97% of the time, the damage never happens in this state, but always during or immediately after a disable. More details of the failures are listed in Fig.8.



Figure 7: Geographical summary of the results of the first irradiation run in IRRAD. The four motherboards are positioned inside the cold box, whose borders are visible in the image. In red are the samples that showed the "high current" type of damage during the irradiation, in black the single sample that failed and in blue those that showed "high current" after the end of the irradiation, when the samples were still in the box at -25C. The number indicates the chronological sequence of the damages. A magnification of the modules is shown below this caption. The pink squares inside the box and in-between the motherboards report the reading from the passive dosimeters in that location (they were exposed for the full irradiation). The black squares at the edge of the motherboards indicate the total accumulated fluence, in 1MeV equivalent neutrons, from the active dosimeters in that location. Just below those numbers, the red squares are the estimated levels of dose in the same location (from comparison with data from the second run). Finally, the red squares around the box represent the readings from passive dosimeters installed in those location.



|         |         |        |       | Currents at Vin=12V (mA) |           |           |          |              |                                                                                                                                                                                        |
|---------|---------|--------|-------|--------------------------|-----------|-----------|----------|--------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| Type    | Vout(V) | Date   | Time  | Off before               | Off after | On before | On after | Comment      | When                                                                                                                                                                                   |
| Aachen  | 2.36    | 11.May | 06:09 | 4.2                      | 8.2       | 58.1      | 59.4     |              | Chip disabled, end of first disable sequence at the first measurement in off                                                                                                           |
| Aachen  | 2.37    | 11.May | 17:01 | 3.9                      | 8.9       | 56.1      | 57.3     |              | Chip disabled, end of last disable sequence                                                                                                                                            |
| FEASTMP | 2.53    | 12.May | 12:04 | 4.5                      | 14.2      | 56.7      | 59       | UVLO damaged | Chip disabled, end of last disable sequence                                                                                                                                            |
| Aachen  | 2.33    | 13.May | 02:53 | 4.5                      | 7.8       | 57.7      | 58.6     |              | Chip disabled, end of last disable sequence                                                                                                                                            |
| Aachen  | 2.4     | 13.May | 21:56 | 4.4                      | 10.8      | 61.1      | 62.9     |              | Chip disabled, end of last disable sequence                                                                                                                                            |
| FEASTMP | 1.52    | 14.May | 01:09 | 3.7                      | 12.2      | 41.8      | 43.3     |              | Chip disabled, end of first disable sequence between<br>two consecutive measurements in off state.                                                                                     |
| FEASTMP | 2.48    | 15.May | 00:26 | 4.6                      | 15.2      | 56.9      | 60.2     |              | End of first enable sequence                                                                                                                                                           |
| Aachen  | 2.39    | 15.May | 04:40 | 4.3                      | 8.9       | 57.1      | 58.6     |              | Chip disabled, end of first disable sequence at the first measurement in off                                                                                                           |
| Aachen  | 2.35    | 17.May | 01:07 | 4.4                      | 10.6      | 55.9      | 57.7     |              | Chip disabled, end of first disable sequence at the first measurement in off                                                                                                           |
| Aachen  | 2.4     | 17.May | 15:56 | 4.3                      | 9         | 55.7      | 57.3     |              | Chip disabled, end of first disable sequence at the first measurement in off                                                                                                           |
| FEASTMP | 2.49    | 17.May | 18:04 | 4.5                      | 12        | 55.7      | 10.1     | DEAD         | During first disable/enable sequence. Current<br>measured when disabled in that sequence was 3.8mA -<br>smaller than before - then the converter did not turn<br>on.                   |
| Aachen  | 2.38    | 18.May | 10:59 | 4.4                      | 8.2       | 56.4      | 57       |              | Seen in first IV curve after night restart of PC (DCDCs<br>left on). At restart, the PS was turned off and all<br>instrument re-initialized, and first IV shows the larger<br>current. |
| FEASTMP | 1.54    | 18.May | 13:15 | 4.8                      | 11.1      | 47        | 48.3     |              | During IV curve.                                                                                                                                                                       |
| FEASTMP | 1.47    | 20.May | 09:15 | 5.1                      | 11.8      | 39.9      | 41.3     |              | Chip disabled, end of first disable sequence at the first measurement in off                                                                                                           |
| FEASTMP | 1.52    | 22.May | 14:34 | 4.2                      | 10.2      | 40.3      | 41.3     | NO BEAM      | End of second enable sequence                                                                                                                                                          |
| FEASTMP | 2.48    | 24.May | 10:16 | 4.2                      | 8.11      | 49.6      | 50.4     | NO BEAM      | Chip disabled, end of last disable sequence                                                                                                                                            |

Figure 8: Table of the damaged converters in chronological order.

The table shown in Fig.8 reports the date and time of the damage, as well as the measured currents with the DCDC off or on immediately before and after the damage. All currents are at an input voltage (nominal) of 12V. Sign of damage (higher current) is systematically observed in proximity of a disable/enable cycle and most often as soon as the current is measured after a disable. This is similar to the failure of DCDCs in the CMS experiment - unfortunately we only have data for the failing converters in CMS, not for the vast majority of damaged samples that went into the "high current" state and for which we do not know when the damage occurred.

The above results represented a real breakthrough in the investigation of the problem. For the first time it was possible to reproduce the problem observed in CMS, and moreover it was possible to observe that:

- there is a direct relationship between the integrated flux and the occurrence of the damage

- the failure rate is not constant during the test, but converters start to show damage only after 9 days in the irradiation area
  - only the converters close to the beam, exposed to larger fluxes, show evidence of damage in the test
- also the "high current" damage occurs during a disable-enable sequence.

These were powerful hints indicating that a cumulative radiation effects might be responsible for the damage and also proving that further tests should be conducted with the same disable-enable sequence. When, driven by these results, DCDC modules were irradiated at high dose rate at our X-ray facility, the observation of the V33Dr node during the disable operation revealed a voltage peak that could exceed 8V in some conditions. This was the key to the full understanding of the problem.

Being the problem associated with a leakage current induced by TID, it is essential to understand the level of TID at which the different samples failed in the IRRAD experiment. This helps setting a limit for the maximum TID beyond which the risk of failure exists - at least at the dose rate and

temperature of the test (both represent a worst case with respect to the typical application of the FEAST chips). Unfortunately the determination of the TID is not easy and surely some relatively large uncertainty will remain for this irradiation for several reasons:

- the samples are distributed at very variable distance from the beam, hence at different dose rates
- the dosimetry is provided by a limited amount of devices:
  - inside the box we had 2 active dosimeters, one at the front and one of the back of the stack of motherboards (front and back with respect to the beam direction). These were positioned in the median point of the motherboards, so we do not have a direct measurement at the distance corresponding to the damaged samples
  - the dose measurement from the active dosimeter (RADFET) was deemed unreliable the device was used in this configuration beyond its calibration range, and results were unreasonable
  - inside the box we also have passive dosimeters in only 2 locations (both PAD and RPL dosimeters in each locations, and they were in good agreement between them)
  - outside the box, we have a good range of passive dosimeters that should help figure out the spatial variation of the beam intensity, and to see if the simulation maps are reasonable.

The estimate of the TID to failure for all samples is based on data from both this run and the second, where a better spatial covering of the volume was achieved by placing a larger number of passive dosimeters. The best estimates are reported in Fig.9. It is important here to highlight the limits of the dosimetry, not only due to the lack of adequate spatial coverage, but also to the type of dosimeter used. The passive dosimeters were of two technologies: alanine PADs and RPL. They were measured after irradiation, and the measured change in their properties compared with a calibration table where the same devices were exposed in a <sup>60</sup>Co source. Therefore the resulting TID levels indicate the equivalent dose of photons in air needed for the dosimeters to show a similar change in properties.



Figure 9: Best estimate of the TID to failure for all samples. The two samples in blue failed after the end of the irradiation, so their TID to failure is identical to the total TID at the end of the exposure. None of the 16 samples in the top green dashed area failed, however it is not possible to have a good idea of the dose these devices accumulated during the test - but certainly below 2Mrad. So the best conclusion one can get for the minimum TID to failure is the dose for the first failure, about 1.2Mrad, in the conditions of this experiment. Note that the16 samples in the "no failures" zone are not shown, but are represented by the 16 numbered white holes on the top of the boards.

### 2. Second Irradiation run

### A. Motivation and experimental conditions

Although the origin of the damage and failure was understood after the first IRRAD run and the following X-ray irradiation studies, we found it very useful to perform a second test in IRRAD in order to confirm the conclusions reached at this stage of the investigation, and to collect additional and more precise information on some aspects. A second irradiation run was hence organised for the summer. For this occasion the full test system was duplicated and a total of 64 DCDC converter modules were used. Also, the motherboard was redesigned to align all the 8 modules as close as possible to the beam and increase the number of samples exposed to the largest flux. Fig. 10 shows the layout of the new motherboard, that was only produced with the connector receiving the FEASTMP type of modules, compared to the old layout (to connect the Aachen type we used a dedicated connector transposer that was meanwhile custom designed and manufactured).



Figure 10: layout of the new (left) board compared to the old one used in the first run.

One full acquisition system was used to test 4 motherboards installed again inside the cold box, at the same -25C temperature, while the other surveyed and biased 4 motherboards positioned on top of the box, at room temperature. Since the TID-induced leakage current increase is strongly dependent on temperature, the exposure of both cold and "warm" modules was expected to yield useful information.

Both Aachen and FEASTMP modules at different biases (1.5, 2.5 and 3.3V) were used - but just because of the limited availability of each type and the large quantity needed for the test. Some bPOL12V.V3 were also added because in X-ray tests they showed the same peaked V33Dr signature, although at larger TID levels and normally with "lethal" consequences for the chip.

The main provision that was suggested to protect the chips from damage consisted in the addition of an external 3kOhm resistor on the V33Dr node. A good fraction of the modules exposed in this second run were modified in that way, while another number had a resistor of 15kOhm in the same position. This latter version was intended to reproduce the resistance value in the FEAST2.2 revision of the DCDC ASIC that was not yet available at the beginning of July.

Other than the addition of an external resistor, another provision proposed to alleviate the problem in the application - for modules already produced that can not be modified - is the avoidance of disable at high input voltage. To verify the effectiveness of this provision, a full motherboard in the cold box used a modified version of the control program and a separate power supply so that a slightly different sequence was applied to the converters. Rather than disabling the converter while the input voltage is at 12V, this sequence turns off the supply voltage as shown in Fig.11. However, if during the exposure the power supply gets stuck to 12V, there is one occasion in the cycle when the converters are disabled at high input voltage. This happened only in one occasion (in all other occasions, the supply got stuck at 0V which is harmless) and the system was reinitialised after 3 full sequences, so the event happened 3 times only and after a week (when no converter had yet shown any sign of damage, indicating that the leakage was not yet sufficient for that purpose).

To answer a question from LHCb, where some modules at room temperature are powered at 8V only and are subject to a radiation field of several hundred krad, a motherboard at room temperature was powered by a separated voltage supply regulated at 8V (8.1V effective when the modules are disabled, decreasing to 7V when they are enabled). This supply was not controlled by the PC, so the 8V were present at all times (no I-V sweep for these modules).

In Figs. 12 and 13 we show an image of the motherboards installed on the top and bottom of the cover of the cold box, ready to be installed in the IRRAD beam line. In Fig.14, the cover is positioned on the box and all cables are connected to the pass-through system bringing the signals to the control room. In Figs. 12 and 13 it is possible to see the active and passive dosimeters that were enabling an improved spatial coverage with respect to run1. This should help knowing more precisely the doses to failure, and hence to determine a safe operating area for the modules using version 2.1 of FEASTMP. A detail of the position and type of the dosimeters is given in Fig. 15.

An important last point to mention is the presence of RC filters at the enable input of half of the modules, both inside and outside the box. This has been done to rule out the possible contribution of noise pick-up on the long enable line, an hypothesis considered in the past. In each motherboard, the RC filter has been installed with a checkerboard pattern so that identical modules were exposed with and without the filter.



Figure 11: Graphical representation of the sequence used for a single motherboard in the cold box during the second run. The converter is only disabled for the I-V sweep, and remains enabled all the time otherwise. To turn it off, the supply voltage is interrupted.



Figure 12: The 4 motherboards installed in the inner side of the box cover, ready for the installation in the experimental area. The dosimeters are visible. The third motherboard from the left was power cycled (sequence shown in Fig.10), while the others had the standard sequence used already in the first run. The actual position of the boards in the experiment is a vertical mirror of this image, where the inner side of the cover is visible.



Figure 13: The 4 motherboards installed in the outer side of the box cover, ready for the installation in the experimental area. The dosimeters are visible. All boards had the standard sequence of I-V sweep and disable-enable, but the last board to the right was powered by a separate supply steadily at 8V (not controlled by the PC).



Figure 14: The full box ready for the irradiation run, with the 4 motherboards are room temperature on top.



Figure 15: position of the passive dosimeters inside and outside the cold box. Active dosimeters were positioned in locations 6 and 8 (outside) as well as 1 and 3 (inside).

### B. Experimental results

The irradiation started on July 4, with first beam at around 9pm. In the following days, the box was moved several times to get closer to the beam and increase the fluxes - always guided by the results of the active dosimeters. The box and Cu target were both removed from the beam on July 24 at 19:15, and then removed from the experimental area on August 2 when they were positioned in a storage room and kept under bias by the same acquisition system.

As already mentioned above, the power supplies got occasionally stuck (loss of communication with the PC running Labview). For motherboards 1, 2 and 4 in cold and 1, 2 and 3 at room T this had no real impact on the measurement since the supply got always stuck at 12V. So all results are representative, with the modules properly biased and with the disable happening at high voltage. For motherboard 4 at room T the supply was not controlled by the program, so there was no issue. However, for motherboard 3 in cold the interruptions had consequences on the sequence. The list of interruptions is shown in Fig. 16.

In the first occurrence the supply was stuck at 12V, meaning that once during the cycle the disable was taking place at high voltage. This happened for 3 cycles but early enough in the test (no module had failed already). In all other events the supply got stuck to 0V. This condition does not represent a threat to the modules, but in the absence of bias the leakage current might anneal partially - in particular since the irradiation was continued. Overall, however, the total time with 0V was limited to less than 13 hours, which is probably negligible.

| Stuck at 12V |            |           |           |          |
|--------------|------------|-----------|-----------|----------|
| Date start   | Time start | Date stop | Time stop |          |
| 09.Jul       | 04:02      |           | 11:45     |          |
|              |            |           |           |          |
| Stuck at OV  |            |           |           |          |
| Date start   | Time start | Date stop | Time stop | Duration |
| 17.Jul       | 05:39      |           | 10:59     | 5h20min  |
| 18.Jul       | 17:41      |           | 19:32     | 1h51min  |
| 20.Jul       | 05:24      |           | 11:32     | 6h8min   |
|              |            |           |           |          |

Figure 16: list of the interruptions of communication to the supply powering motherboard 3 in the cold box. This is the one with a different sequence, as shown in Fig.11.

During this second run no converter broke, but a number of samples was damaged in the "high current" mode. Fig.17 summarises the results of the run for the motherboards in the cold box, detailing the population of the modules and the results from the dosimetry. Only samples in motherboard 2 were damaged, and this was well in agreement with the expectations because it was the only board with modules unprotected and disabled at high voltage in the sequence. None of the modules protected by either the 3 or the 15kOhm resistors, or by a different disable sequence (power down) were damaged. No bPOL12V module was damaged either. The readings from the passive dosimeters reported in blue have been adjusted to take into account the temperature inside the box, while the green 1MeV-equivalent neutron fluence are those reported by the active dosimeters (calibrated diodes). The dose to failure for the modules in motherboard 2 are estimates based on the readings from the nearest passive dosimeters (in blue). The same considerations made in the discussion of the results from run1 apply here for the doses: the levels are directly coming from the passive dosimeters calibrated in a <sup>60</sup>Co source.

It should be noted that some of the samples surviving unharmed the exposure of run 1 were used again for this run, in an attempt to provoke earlier failure and to check if the previously accumulated dose still influenced the result after a long period of annealing (although at room temperature and without bias). These were the first to show evidence of damage in the cold box, as shown in Fig. 17.

The same representation is used in Fig. 18 for the motherboards exposed at room temperature. Here also there was no broken converter: only the "high current" damage was observed. In this case, some of the unprotected modules were not damaged in the two leftmost motherboards, and there is no clear tendency for early failure of modules already irradiated in run 1.



Figure 17: Summary of the results in the cold box for run 2. Dark grey squares represent fresh modules with FEAST2.1 (Aachen or FEASTMP designs); clear grey represents modules with FEAST2.1 already exposed during run 1 (but unharmed); yellow squares represent bPOL12V.V3 modules. If the modules were protected by a resistor on V33Dr, the value of the resistor in Ohm is indicated on the module. The temporal sequence of the observed damages is illustrated by the red numbers, and the values close to each damage indicate the estimated TID to failure. Results from the dosimetry are reported, in the appropriate location, in the blue and green squares.



Figure 18: Summary of the results for samples at room temperature during run 2. The representation is the same as for Fig. 17 above. The dosimeters report the same levels inside and outside the box: because of the geographical arrangement and of the actual readings we believe that the environment was very comparable in the two locations.

As in the previous run, all damage appears during or after a disable sequence. The currents after the event are comprised between about 8 and 13mA. Also in this case the current increase in the off state is considerably larger than the one in the on state, which is limited to 1-2 mA. All damage characteristics are hence very comparable to those observed in the first run. This result is incompatible with the hypothesis where the damage is due to the noise pickup on the enable line because all modules in the board, regardless the presence of the RC filter, have been identically damaged.

In order to define a safe area of operation for the FEAST2.1 converters, it is important to summarise all results obtained in the two irradiation runs. This is done in Fig. 19, that reports the best estimate for the TID to failure for all modules. The image has to be taken with caution, since the dose levels are, as already pointed out twice before, given by a combination of passive dosimeter results and extrapolation. Also, the size of each data point is proportional to the

uncertainty on the dose added by the physical distance of the device from a passive dosimeter and only to that. The closest the module to a dosimeter, the smaller the uncertainty added by this parameter to the overall large error on the TID estimates. Some general conclusions can be drawn from these results:

- damage only appears in all cases at estimated TID levels above 1MRad
- samples at room temperature start to show damage at a dose larger than those kept at -25C
- samples already exposed in run 1 are damaged at lower doses when irradiated again at -25C, but this tendency is not observed at room temperature.



Figure 19: Estimated dose to failure for all samples damaged during the two IRRAD runs. Estimates are based on all the available dosimetry results, and the size of each data point only represents additional uncertainty proportional to the physical distance of the module to the nearest dosimeter. Green data points refer to the first irradiation run, blue (for modules in the cold box) and red (for modules at room temperature) refer to the second irradiation run. Samples exposed in both runs are represented with darker colours.

Since all input currents were monitored regularly during the irradiation, it is possible to analyse the available data and extract some additional information about the variation in the modules' performance in the radiation environment. This analysis was done thoroughly for both the modules in the cold box and outside of it. The input current measured during the full irradiation run for one cold sample (B1C1) is shown in Fig. 20 as an example. Only the blue line has to be looked at, the purple represents the number of converters enabled at any time. The figure is difficult to read, especially for the points taken during the I-V sweep, but it is possible to see the trend of increased on-current with time.

Other information can be extracted by the direct comparison of both off- and on-currents at the beginning and end of the irradiation for all converters. Given the relatively large variability in the converter characteristics (pre-irradiated in run1, fresh and unprotected, fresh and protected by a 3 or 15 kOhm resistors, FEASTMP vs. Aachen vs. bPOL12V, different output voltages, cold and room temperature), the observation of the trends enables some interesting conclusions. These are the main ones, some of which can be seen in Fig. 21 that summarises average results:

- Similar converters always show very comparable current increases

- Both on- and off-current increases are larger for cold modules than for modules at room T. This is in principle in agreement with the known temperature dependence of the TID-induced leakage current
- Similar converters positioned in different positions during the test yielded comparable results, indicating that the TID must have been rather uniform during the exposure. The only exception is for bPOL12V modules, indicating with low statistical significance that inside the box the dose was larger on the left board (see Fig. 17). The current increase was respectively 22 and 18mA
- FEASTMP and Aachen modules show very comparable results
- The output voltage is not affecting the on-current increase (with one possible exception for 3.3V at cold, but the statistical significance is low)
- Modules already exposed in run1 show a larger initial on-current by a few mA (not true for the off-current). Their current increase is then smaller during the run2
- Modules biased at 8V at room temperature seem generally to show a smaller increase
- bPOL12V samples have a significant larger current increase.



Figure 20: Input current (in blue) measured during each sequence for the full irradiation on converter module B1C1, inside the cold box. Only the on-current is readable in this graph, and it shows an increase in time associated to the TID-induced leakage current.



Figure 20: Average input current increase for FEAST2.1 (both FEASTMP and Aachen modules results are merged here) and bPOL12V samples. Results are for off- and on-currents of fresh modules, and compared with modules already exposed in run1, in green. The motherboards powered at 8V supply voltage are not included in the averages. Note that the modules from run1 have a smaller current increase during run2, but they already have a larger current than fresh modules at the beginning of run2.

### 3. CONCLUSIONS

At the end of these two long irradiation campaigns, it is possible to draw a series of conclusions. These will be listed and discussed in this section.

### A. Origin of the failures in the CMS pixel system

Combined with the results from X-ray irradiation, the IRRAD tests have clearly indicated that the failures in the CMS pixel system are traceable to a vulnerability of the FEAST2.1 ASICs to total ionising dose. TID induces leakage currents in the high-voltage NMOS transistors in the chips, and this produces an over-voltage on the V33Dr node every time the DCDC converter is disabled. This mechanism occurs only when the leakage is large, which requires irradiation to levels above 1Mrad in relatively short periods of time (high dose rate). It appears obvious from the problems experienced by CMS during the 2017 runs that these conditions are met in this application, hence FEAST2.1-based DCDC modules are at risk.

The tests of run 2 clearly confute any implication of noise pick-up on the enable or power good lines to the insurgence of the problem. Neighbour samples with or without enable filters, and without any PowerGood connection to any line, are subject to identical damage in the experiment.

### B. Protection strategies for FEAST2.1

Two different strategies for the protection of DCDC modules using FEAST2.1 have been proposed:

- addition of a 3kOhm resistance on-board on the V33Dr node
- avoidance of disable command when the converter is powered at high voltage.

These strategies were tested in both IRRAD run 2 and at the X-ray machine. The addition of a 3kOhm resistor very efficiently eliminates the leakage-current induced voltage peak, hence ensuring full protection at even the large dose rates of the X-ray machine. In the IRRAD run 2, samples protected with this strategy did not show any damage (some samples were irradiated to an estimated TID well above 3Mrad at -25C).

The second strategy was tested in two different ways: in the cold box a full motherboard was not disabled but its power supply was turned off instead, while at room temperature a full motherboard was exposed while steadily powered at 8V (effectively 7V when the converters were running, rising to 8V after the disable). Again, no sample was damaged during the test at doses that were estimated to be between 2.2 and 3.2 Mrad for the power down at -25C and 2.8-3 Mrad for the 8V at room temperature.

Both strategies seem hence efficient, but the preference should be given to the addition of a resistor because in this way the voltage peaks are completely removed and there is no impact on the system (powering down the full module, especially when several modules are powered by the same supply along long cables, is not very practical and might induce oscillations). On the other hand, although there is no evidence of damage when the supply voltage is reduced to 7-8V, an over-voltage is anyway present at the V33Dr node at every disable. This might lead to long-term stress on the module that in the long run might create reliability problems.

#### C. Safe operating area for unprotected FEAST2.1 ASICs

With more than 60,000 FEAST2 and FEAST2.1 ASICs already distributed to the LHC experiments, this is a very relevant topic, and the results from the IRRAD tests are instrumental in defining the safe area of operation of unprotected FEAST modules. We can discuss the available data separately for different ambient temperatures, since from the results and from the known characteristics of radiation-induced leakage currents the problem gets worse at lower temperature.

At -25C the distribution of the dose-to-failure of fresh samples extends from 1.2 to 2.3Mrad, lowering to 1Mrad for samples already exposed to >500krad in run 1. These levels are valid for the dose rate used in the experiment, an average of the order of 5-10krad/hour, and are expected

to increase at lower rates - however not by much, since it looks like there is little annealing at this temperature.

At room temperature, instead, both fresh or pre-exposed samples appear to be damaged only from 1.7Mrad, with more than half of the population still unharmed after more than 3Mrad. These levels again are valid for the dose rate used in the experiment (5-10 krad/hour), but in this case it is reasonable to expect larger doses-to-failure at considerably lower dose rates since annealing is relevant at room temperature and above - a condition very common to FEASTMP modules used in the LHC experiments. This suggests that experiments where FEASTMP modules are water- or air-cooled and are exposed to 500krad or less over 10 years (average dose rate < 6 rad/hour) are to be considered safe against the occurrence of the same problem that affected the CMS pixel detector.

### D. Estimated improved tolerance of FEAST2.2

Some indication of the improved tolerance of the modified FEAST2.2 ASIC can also be extracted from the IRRAD run 2. In this modified design, a 15 kOhm resistor is inserted on-chip on the V33Dr node. This modification was driven by intuition at a time when the origin of the problem was yet unknown, and the value of the resistance by the availability of resistors in the FEAST2.1 pre-metal masks. In run 2 several modified modules were tested where an external 15 kOhm resistor was added to the V33Dr node to imitate the FEAST2.2 configuration. None of these modules was damaged at either -25C or room temperature, while exposed to an estimated TID level of about 3Mrad. This is a promising indication that should be complemented by more dedicated X-ray irradiation measurements. However, modules using the modified version of the ASIC will certainly allow a large extension of the safe operating area of the modules with respect to the considerations above.

# Acknowledgements

As already stated at the beginning of this document, the authorship is limited to the writing of this report while all the work was possible thanks to the coordinate activity of a number of colleagues. First the colleagues from the "On-detector power management" CERN Team: S.Michelis and G.Ripamonti (EP-ESE-ME). S.Michelis in particular was directly involved in all aspects of the long investigation of the FEAST module's failure in CMS. N.Bacchetta (EP-UCM) coordinated the supply of samples of the Aachen/CMS modules, and participated in the definition of the test. The setup hardware was built by S.Cuadrado Calzada (EP-CMO) with the help of D.Porret (EP-ESE-ME). A.Karneyeu (EP-UCM) wrote the Labview program for data acquisition while T.Prousalidi (EP-CMX-DA) wrote the Python code for data analysis. M.Hansen (EP-ESE-BE) participated in the design of the IRRAD test system and supported strongly this work. M.Smith (EP-ESE-ME) surveyed the whole second IRRAD run. A last acknowledgement goes to F.Ravotti and G.Pezzullo (both with EP-DT-DD) for all their proactive support with the IRRAD facility.