Recall performance#

First, load some sample data and create a merged DataFrame:

In [1]: from psifr import fr

In [2]: df = fr.sample_data('Morton2013')

In [3]: data = fr.merge_free_recall(df)

Raster plot#

Raster plots can give you a quick overview of a whole dataset [RKT16]. We’ll look at all of the first subject’s recalls using plot_raster(). This will plot every individual recall, colored by the serial position of the recalled item in the list. Items near the end of the list are shown in yellow, and items near the beginning of the list are shown in purple. Intrusions of items not on the list are shown in red.

In [4]: subj = fr.filter_data(data, 1)

In [5]: g = fr.plot_raster(subj).add_legend()
../_images/raster_subject.svg

Serial position curve#

We can calculate average recall for each serial position [Mur62] using spc() and plot using plot_spc().

In [6]: recall = fr.spc(data)

In [7]: g = fr.plot_spc(recall)
../_images/spc.svg

Using the same plotting function, we can plot the curve for each individual subject:

In [8]: g = fr.plot_spc(recall, col='subject', col_wrap=5)
../_images/spc_indiv.svg

Probability of Nth recall#

We can also split up recalls, to test for example how likely participants were to initiate recall with the last item on the list, using pnr().

In [9]: prob = fr.pnr(data)

In [10]: prob
Out[10]: 
       subject  output  input      prob  actual  possible
0            1       1      1  0.000000       0        48
1            1       1      2  0.020833       1        48
2            1       1      3  0.000000       0        48
3            1       1      4  0.000000       0        48
4            1       1      5  0.000000       0        48
...        ...     ...    ...       ...     ...       ...
23035       47      24     20       NaN       0         0
23036       47      24     21       NaN       0         0
23037       47      24     22       NaN       0         0
23038       47      24     23       NaN       0         0
23039       47      24     24       NaN       0         0

[23040 rows x 6 columns]

This gives us the probability of recall by output position ('output') and serial or input position ('input'). This is a lot to look at all at once, so it may be useful to plot just the first three output positions. We can plot the curves using plot_spc(), which takes an optional hue input to specify a variable to use to split the data into curves of different colors.

In [11]: pfr = prob.query('output <= 3')

In [12]: g = fr.plot_spc(pfr, hue='output').add_legend()
../_images/pnr.svg

This plot shows what items tend to be recalled early in the recall sequence.

Prior-list intrusions#

Participants will sometimes accidentally recall items from prior lists; these recalls are known as prior-list intrusions (PLIs). To better understand how prior-list intrusions are happening, you can look at how many lists back those items were originally presented using pli_list_lag().

First, you need to choose a maximum list lag that you will consider. This determines which lists will be included in the analysis. For example, if you have a maximum lag of 3, then the first 3 lists will be excluded from the analysis. This ensures that each included list can potentially have intrusions of each possible list lag.

In [13]: pli = fr.pli_list_lag(data, max_lag=3)

In [14]: pli
Out[14]: 
     subject  list_lag  count  per_list      prob
0          1         1      7  0.155556  0.259259
1          1         2      5  0.111111  0.185185
2          1         3      0  0.000000  0.000000
3          2         1      9  0.200000  0.191489
4          2         2      2  0.044444  0.042553
..       ...       ...    ...       ...       ...
115       46         2      1  0.022222  0.100000
116       46         3      0  0.000000  0.000000
117       47         1      5  0.111111  0.277778
118       47         2      1  0.022222  0.055556
119       47         3      0  0.000000  0.000000

[120 rows x 5 columns]

In [15]: pli.groupby('list_lag').agg(['mean', 'sem'])
Out[15]: 
         subject          count  ...  per_list      prob          
            mean      sem  mean  ...       sem      mean       sem
list_lag                         ...                              
1           24.9  2.24488  5.55  ...  0.012170  0.210631  0.014726
2           24.9  2.24488  1.35  ...  0.005129  0.043458  0.007032
3           24.9  2.24488  0.75  ...  0.003878  0.023385  0.005602

[3 rows x 8 columns]

The analysis returns a raw count of intrusions at each lag (count), the count divided by the number of included lists (per_list), and the probability of a given intrusion coming from a given lag (prob). In the sample dataset, recently presented items (i.e., with lower list lag) are more likely to be intruded.