Scoring data

After importing free recall data, we have a DataFrame with a row for each study event and a row for each recall event. Next, we need to score the data by matching study events with recall events.

Scoring list recall

First, let’s create a simple sample dataset with two lists:

In [1]: import pandas as pd

In [2]: data = pd.DataFrame(
   ...:     {'subject': [1, 1, 1, 1, 1, 1,
   ...:                  1, 1, 1, 1, 1, 1],
   ...:      'list': [1, 1, 1, 1, 1, 1,
   ...:               2, 2, 2, 2, 2, 2],
   ...:      'trial_type': ['study', 'study', 'study',
   ...:                     'recall', 'recall', 'recall',
   ...:                     'study', 'study', 'study',
   ...:                     'recall', 'recall', 'recall'],
   ...:      'position': [1, 2, 3, 1, 2, 3,
   ...:                   1, 2, 3, 1, 2, 3],
   ...:      'item': ['absence', 'hollow', 'pupil',
   ...:               'pupil', 'absence', 'empty',
   ...:               'fountain', 'piano', 'pillow',
   ...:               'pillow', 'fountain', 'pillow']})
   ...: 

In [3]: data
Out[3]: 
    subject  list trial_type  position      item
0         1     1      study         1   absence
1         1     1      study         2    hollow
2         1     1      study         3     pupil
3         1     1     recall         1     pupil
4         1     1     recall         2   absence
5         1     1     recall         3     empty
6         1     2      study         1  fountain
7         1     2      study         2     piano
8         1     2      study         3    pillow
9         1     2     recall         1    pillow
10        1     2     recall         2  fountain
11        1     2     recall         3    pillow

Next, we’ll merge together the study and recall events by matching up corresponding events:

In [4]: from psifr import fr

In [5]: study = data.query('trial_type == "study"').copy()

In [6]: recall = data.query('trial_type == "recall"').copy()

In [7]: merged = fr.merge_lists(study, recall)

In [8]: merged
Out[8]: 
   subject  list      item  input  output  recalled  repeat  intrusion
0        1     1   absence    1.0     2.0      True       0      False
1        1     1    hollow    2.0     NaN     False       0      False
2        1     1     pupil    3.0     1.0      True       0      False
3        1     1     empty    NaN     3.0      True       0       True
4        1     2  fountain    1.0     2.0      True       0      False
5        1     2     piano    2.0     NaN     False       0      False
6        1     2    pillow    3.0     1.0      True       0      False
7        1     2    pillow    3.0     3.0      True       1      False

For each item, there is one row for each unique combination of input and output position. For example, if an item is presented once in the list, but is recalled multiple times, there is one row for each of the recall attempts. Repeated recalls are indicated by the repeat column, which is greater than zero for recalls of an item after the first.

Items that were not recalled have the recalled column set to False. Because they were not recalled, they have no defined output position, so output is set to NaN. Finally, intrusions have an output position but no input position because they did not appear in the list. There is an intrusion field for convenience to label these recall attempts.