Switching costs in stochastic environments lead to the emergence of pairing behavior in decision-making in animals through the promotion of reward learning strategies


Binary choice experiments

To test whether the animals would really adopt a simple WSLS strategy and exhibit PM behavior, we conducted binary choice experiments using budgerigars, which have been widely used in studies of different cognitive abilities, such as learning to vocalization.23.24, and problem solving25.26. In this study, eighteen unrelated parakeets were used for the binary choice experiments and their ages ranged from less than 1 year to 3 years.

Budgerigars were housed separately in different cages with a size of 20 × 20 × 20 cm before each experiment. Binary choice experiments were carried out in a mesh cage measuring 2 × 1 × 2 m (Supplementary Fig. S1). A single perch was positioned in the center of the cage at a height of 0.8 m from the ground. Two food cups were placed on the front wall at a height of 1.6m from the floor, separated by 1.6m, but only one cup contained the food reward in each trial. For illustration, we refer to the side with a higher probability of having rewards as the H side, and the other side as the L side in the following. We assume that the food rewards would occur on the H side with a probability q, and on the L side with a probability 1 – q.

We first generated food reward slot sequences for 100 trials under three different random levels (q= 0.5, 0.6 and 0.75) using MATLAB (version 7.5, R2007b, The MathWorks Inc.). Each bird was placed in the experimental cage for two days to adapt to the environment, and fed on the food provided in the cups beforehand (both cups contain food during this period). Before the experiments, each bird was deprived of food for 24 hours. Then, for each experimental run, we placed about 20 grains of millet in the food cup. Once a bird had made a decision and had eaten millet (after ~ 8-10s), we removed the two food cups, after which the bird returned to the perch and waited for the next trial, which took place. after a period of one minute. If the bird chose a wrong side (i.e. without food reward), we allowed it to fly to the other side, after which we immediately removed both food cups from the cage. Since the study subject would become full after about 30 leads, the total of 100 trials were then conducted over three consecutive days. Each day after performing the experiments, the bird would be starved of food until the experiments resumed the next day. To avoid memory interference between random levels, we assigned each bird to a single set of 100 trials. We used three different birds for the experiments under each of the random levels of (q = 0.6 ) and 0.75, and five birds below the random level of (q = 0.5 ). To avoid possible lateral preference effects, we also used three other different birds for the experiments under each of the random levels of (q = 0.6 ) and 0.75, and a bird below the random level of (q = 0.5 ) with the same food placement sequences, but changing the position of the food reward to the opposite side on each try.

This study complies with all applicable government regulations regarding the ethical treatment of animals. All use and care of the animals was carried out according to the guidelines of the Institute of Zoology of the Chinese Academy of Sciences (CAS). This work was authorized by the Animal Care and Use Committee of the Institute of Zoology, CAS.

Evaluate outcome information for decision making

To assess how our budgies made their decisions through reward learning, we first used a one-parameter parameter (time constants ( tau )) Leaky integration model to quantify the information about the results in each trial27. This model uses a function similar to an exponential filter, which was derived from a signal processing method28. Since the food reward was the only income the parakeets earned from the binary choice experiments, we incorporated the reward history on each side as the outcome information. Due to memory capacity limitation29 only a finite number of previous trials could be instructive for policy makers. More precisely, the information on the results of each side ( ({y} _ {i} = {y} _ {H} ) for side H or ({y} _ {L} ) for side L) under test (t ) was calculated as:

({y} _ {i}

(1)

or ({x} _ {i} (t-1) ) is the income earned during the last try (1 or 0) and (a = 1-exp left (-1 / tau right) ) is a constant between 0 and 1, where ( tau ) is the time constant. We can see that the most recent reward is more informative for making the current decision (Supplementary Fig. S2). Plus, reward information from the past ( tau ) tests can explain 63.2% of production ({y} _ {i}

Comments are closed.