Recall order¶
A key advantage of free recall is that it provides information not only about what items are recalled, but also the order in which they are recalled. A number of analyses have been developed to charactize different influences on recall order, such as the temporal order in which the items were presented at study, the category of the items themselves, or the semantic similarity between pairs of items.
Each conditional response probability (CRP) analysis involves calculating the probability of some type of transition event. For the lag-CRP analysis, transition events of interest are the different lags between serial positions of items recalled adjacent to one another. Similar analyses focus not on the serial position in which items are presented, but the properties of the items themselves. A semantic-CRP analysis calculates the probability of transitions between items in different semantic relatedness bins. A special case of this analysis is when item pairs are placed into one of two bins, depending on whether they are in the same stimulus category or not. In Psifr, this is referred to as a category-CRP analysis.
Lag-CRP¶
In all CRP analyses, transition probabilities are calculated conditional on a given transition being available. For example, in a six-item list, if the items 6, 1, and 4 have been recalled, then possible items that could have been recalled next are 2, 3, or 5; therefore, possible lags at that point in the recall sequence are -2, -1, or +1. The number of actual transitions observed for each lag is divided by the number of times that lag was possible, to obtain the CRP for each lag.
First, load some sample data and create a merged DataFrame:
In [1]: from psifr import fr
In [2]: df = fr.sample_data('Morton2013')
In [3]: data = fr.merge_free_recall(df, study_keys=['category'])
Next, call lag_crp()
to calculate conditional response
probability as a function of lag.
In [4]: crp = fr.lag_crp(data)
In [5]: crp
Out[5]:
prob actual possible
subject lag
1 -23.0 0.020833 1 48
-22.0 0.035714 3 84
-21.0 0.026316 3 114
-20.0 0.024000 3 125
-19.0 0.014388 2 139
... ... ... ...
47 19.0 0.061224 3 49
20.0 0.055556 2 36
21.0 0.045455 1 22
22.0 0.071429 1 14
23.0 0.000000 0 6
[1880 rows x 3 columns]
The results show the count of times a given transition actually happened
in the observed recall sequences (actual
) and the number of times a
transition could have occurred (possible
). Finally, the prob
column
gives the estimated probability of a given transition occurring, calculated
by dividing the actual count by the possible count.
Use plot_lag_crp()
to display the results:
In [6]: g = fr.plot_lag_crp(crp)
The peaks at small lags (e.g., +1 and -1) indicate that the recall sequences show evidence of a temporal contiguity effect; that is, items presented near to one another in the list are more likely to be recalled successively than items that are distant from one another in the list.
Lag rank¶
We can summarize the tendency to group together nearby items using a lag rank analysis. For each recall, this determines the absolute lag of all remaining items available for recall and then calculates their percentile rank. Then the rank of the actual transition made is taken, scaled to vary between 0 (furthest item chosen) and 1 (nearest item chosen). Chance clustering will be 0.5; clustering above that value is evidence of a temporal contiguity effect.
In [7]: ranks = fr.lag_rank(data)
In [8]: ranks
Out[8]:
rank
subject
1 0.610953
2 0.635676
3 0.612607
4 0.667090
5 0.643923
... ...
43 0.554024
44 0.561005
45 0.598151
46 0.652748
47 0.621245
[40 rows x 1 columns]
In [9]: ranks.agg(['mean', 'sem'])
Out[9]:
rank
mean 0.624699
sem 0.006732
Category CRP¶
If there are multiple categories or conditions of trials in a list, we can test whether participants tend to successively recall items from the same category. The category-CRP estimates the probability of successively recalling two items from the same category.
In [10]: cat_crp = fr.category_crp(data, category_key='category')
In [11]: cat_crp
Out[11]:
prob actual possible
subject
1 0.801147 419 523
2 0.733456 399 544
3 0.763158 377 494
4 0.814882 449 551
5 0.877273 579 660
... ... ... ...
43 0.809187 458 566
44 0.744376 364 489
45 0.763780 388 508
46 0.763573 436 571
47 0.806907 514 637
[40 rows x 3 columns]
In [12]: cat_crp[['prob']].agg(['mean', 'sem'])
Out[12]:
prob
mean 0.782693
sem 0.006262
The expected probability due to chance depends on the number of categories in the list. In this case, there are three categories, so a category CRP of 0.33 would be predicted if recalls were sampled randomly from the list.
Distance CRP¶
While the category CRP examines clustering based on semantic similarity at a coarse level (i.e., whether two items are in the same category or not), recall may also depend on more nuanced semantic relationships.
Models of semantic knowledge allow the semantic distance between pairs of items to be quantified. If you have such a model defined for your stimulus pool, you can use the distance CRP analysis to examine how semantic distance affects recall transitions.
You must first define distances between pairs of items. Here, we use correlation distances based on the wiki2USE model.
In [13]: items, distances = fr.sample_distances('Morton2013')
We also need a column indicating the index of each item in the
distances matrix. We use pool_index()
to create
a new column called item_index
with the index of each item in
the pool corresponding to the distances matrix.
In [14]: data['item_index'] = fr.pool_index(data['item'], items)
Finally, we must define distance bins. Here, we use 10 bins with
equally spaced distance percentiles. Note that, when calculating
distance percentiles, we use the squareform()
function to
get only the non-diagonal entries.
In [15]: from scipy.spatial.distance import squareform
In [16]: edges = np.percentile(squareform(distances), np.linspace(1, 99, 10))
We can now calculate conditional response probability as a function of distance bin, to examine how response probability varies with semantic distance.
In [17]: dist_crp = fr.distance_crp(data, 'item_index', distances, edges)
In [18]: dist_crp
Out[18]:
bin prob actual possible
subject center
1 0.467532 (0.352, 0.583] 0.085456 151 1767
0.617748 (0.583, 0.653] 0.067916 87 1281
0.673656 (0.653, 0.695] 0.062500 65 1040
0.711075 (0.695, 0.727] 0.051836 48 926
0.742069 (0.727, 0.757] 0.050633 44 869
... ... ... ... ...
47 0.742069 (0.727, 0.757] 0.062822 61 971
0.770867 (0.757, 0.785] 0.030682 27 880
0.800404 (0.785, 0.816] 0.040749 37 908
0.834473 (0.816, 0.853] 0.046651 39 836
0.897275 (0.853, 0.941] 0.028868 25 866
[360 rows x 4 columns]
Use plot_distance_crp()
to display the results:
In [19]: g = fr.plot_distance_crp(dist_crp).set(ylim=(0, 0.1))
Conditional response probability decreases with increasing semantic distance, suggesting that recall order was influenced by the semantic similarity between items. Of course, a complete analysis should address potential confounds such as the category structure of the list. See the Restricting analysis to specific items section for an example of restricting analysis based on category.
Distance rank¶
Similarly to the lag rank analysis of temporal clustering, we can summarize distance-based clustering (such as semantic clustering) with a single rank measure. The distance rank varies from 0 (the most-distant item is always recalled) to 1 (the closest item is always recalled), with chance clustering corresponding to 0.5.
In [20]: dist_rank = fr.distance_rank(data, 'item_index', distances)
In [21]: dist_rank.agg(['mean', 'sem'])
Out[21]:
rank
mean 0.625932
sem 0.003466
Restricting analysis to specific items¶
Sometimes you may want to focus an analysis on a subset of recalls. For example, in order to exclude the period of high clustering commonly observed at the start of recall, lag-CRP analyses are sometimes restricted to transitions after the first three output positions.
You can restrict the recalls included in a transition analysis using
the optional item_query
argument. This is built on the Pandas
query/eval system, which makes it possible to select rows of a
DataFrame
using a query string. This string can refer to any
column in the data. Any items for which the expression evaluates to
True
will be included in the analysis.
For example, we can use the item_query
argument to exclude any
items recalled in the first three output positions from analysis. Note
that, because non-recalled items have no output position, we need to
include them explicitly using output > 3 or not recall
.
In [22]: crp_op3 = fr.lag_crp(data, item_query='output > 3 or not recall')
In [23]: g = fr.plot_lag_crp(crp_op3)
Restricting analysis to specific transitions¶
In other cases, you may want to focus an analysis on a subset of transitions based on some criteria. For example, if a list contains items from different categories, it is a good idea to take this into account when measuring temporal clustering using a lag-CRP analysis. One approach is to separately analyze within- and across-category transitions.
Transitions can be selected for inclusion using the optional
test_key
and test
inputs. The test_key
indicates a column of the data to use for testing transitions; for
example, here we will use the category
column. The
test
input should be a function that takes in the test value
of the previous recall and the current recall and returns True or False
to indicate whether that transition should be included. Here, we will
use a lambda (anonymous) function to define the test.
In [24]: crp_within = fr.lag_crp(data, test_key='category', test=lambda x, y: x == y)
In [25]: crp_across = fr.lag_crp(data, test_key='category', test=lambda x, y: x != y)
In [26]: crp_combined = pd.concat([crp_within, crp_across], keys=['within', 'across'], axis=0)
In [27]: crp_combined.index.set_names('transition', level=0, inplace=True)
In [28]: g = fr.plot_lag_crp(crp_combined, hue='transition').add_legend()
The within
curve shows the lag-CRP for transitions between
items of the same category, while the across
curve shows
transitions between items of different categories.