Psifr documentation¶
In free recall, participants study a list of items and then name all of the items they can remember in any order they choose. Many sophisticated analyses have been developed to analyze data from free recall experiments, but these analyses are often complicated and difficult to implement.
Psifr leverages the Pandas data analysis package to make precise and flexible analysis of free recall data faster and easier.
See the code repository for version release notes.
Installation¶
You can install the latest stable version of Psifr using pip:
pip install psifr
You can also install the development version directly from the code repository on GitHub:
pip install git+git://github.com/mortonne/psifr
User guide¶
Importing data¶
In Psifr, free recall data are imported in the form of a “long” format table. Each row corresponds to one study or recall event. Study events include any time an item was presented to the participant. Recall events correspond to any recall attempt; this includes repeats of items there were already recalled and intrusions of items that were not present in the study list.
This type of information is well represented in a CSV spreadsheet, though any file format supported by pandas may be used for input. To import from a CSV, use pandas. For example:
import pandas as pd
data = pd.read_csv("my_data.csv")
Trial information¶
The basic information that must be included for each event is the following:
- subject
Some code (numeric or string) indicating individual participants. Must be unique for a given experiment. For example,
sub-101
.- list
Numeric code indicating individual lists. Must be unique within subject.
- trial_type
String indicating whether each event is a
study
event or arecall
event.- position
Integer indicating position within a given phase of the list. For
study
events, this corresponds to input position (also referred to as serial position). Forrecall
events, this corresponds to output position.- item
Individual thing being recalled, such as a word. May be specified with text (e.g.,
pumpkin
,Jack Nicholson
) or a numeric code (682
,121
). Either way, the text or number must be unique to that item. Text is easier to read and does not require any additional information for interpretation and is therefore preferred if available.
Example¶
subject |
list |
trial_type |
position |
item |
---|---|---|---|---|
1 |
1 |
study |
1 |
absence |
1 |
1 |
study |
2 |
hollow |
1 |
1 |
study |
3 |
pupil |
1 |
1 |
recall |
1 |
pupil |
1 |
1 |
recall |
2 |
absence |
Additional information¶
Additional fields may be included in the data to indicate other aspects of the experiment, such as presentation time, stimulus category, experimental session, distraction length, etc. All of these fields can then be used for analysis in Psifr.
Scoring data¶
After importing free recall data, we have a DataFrame with a row for each study event and a row for each recall event. Next, we need to score the data by matching study events with recall events.
Scoring list recall¶
First, let’s create a simple sample dataset with two lists. We can use
the table_from_lists()
convenience function to create
a sample dataset with a given set of study lists and recalls:
In [1]: from psifr import fr
In [2]: list_subject = [1, 1]
In [3]: study_lists = [['absence', 'hollow', 'pupil'], ['fountain', 'piano', 'pillow']]
In [4]: recall_lists = [['pupil', 'absence', 'empty'], ['pillow', 'pupil', 'pillow']]
In [5]: data = fr.table_from_lists(list_subject, study_lists, recall_lists)
In [6]: data
Out[6]:
subject list trial_type position item
0 1 1 study 1 absence
1 1 1 study 2 hollow
2 1 1 study 3 pupil
3 1 1 recall 1 pupil
4 1 1 recall 2 absence
5 1 1 recall 3 empty
6 1 2 study 1 fountain
7 1 2 study 2 piano
8 1 2 study 3 pillow
9 1 2 recall 1 pillow
10 1 2 recall 2 pupil
11 1 2 recall 3 pillow
Next, we’ll merge together the study and recall events by matching up corresponding events:
In [7]: merged = fr.merge_free_recall(data)
In [8]: merged
Out[8]:
subject list item input ... repeat intrusion prior_list prior_input
0 1 1 absence 1.0 ... 0 False NaN NaN
1 1 1 hollow 2.0 ... 0 False NaN NaN
2 1 1 pupil 3.0 ... 0 False NaN NaN
3 1 1 empty NaN ... 0 True NaN NaN
4 1 2 fountain 1.0 ... 0 False NaN NaN
5 1 2 piano 2.0 ... 0 False NaN NaN
6 1 2 pillow 3.0 ... 0 False NaN NaN
7 1 2 pillow 3.0 ... 1 False NaN NaN
8 1 2 pupil NaN ... 0 True 1.0 3.0
[9 rows x 11 columns]
For each item, there is one row for each unique combination of input and
output position. For example, if an item is presented once in the list, but
is recalled multiple times, there is one row for each of the recall attempts.
Repeated recalls are indicated by the repeat
column, which is greater than
zero for recalls of an item after the first. Unique study events are indicated
by the study
column; this excludes intrusions and repeated recalls.
Items that were not recalled have the recall
column set to False
. Because
they were not recalled, they have no defined output position, so output
is
set to NaN
. Finally, intrusions have an output position but no input position
because they did not appear in the list. There is an intrusion
field for
convenience to label these recall attempts. The prior_list
and prior_input
fields give information about prior-list intrusions (PLIs) of items from prior
lists. The prior_list
field gives the list where the item appeared and
prior_input
indicates the position in which is was presented on that list.
merge_free_recall()
can also handle additional attributes beyond
the standard ones, such as codes indicating stimulus category or list condition.
See Working with custom columns for details.
Filtering and sorting¶
Now that we have a merged DataFrame
, we can use Pandas methods to quickly
get different views of the data. For some analyses, we may want to organize in
terms of the study list by removing repeats and intrusions. Because our data
are in a DataFrame
, we can use the DataFrame.query
method:
In [9]: merged.query('study')
Out[9]:
subject list item input ... repeat intrusion prior_list prior_input
0 1 1 absence 1.0 ... 0 False NaN NaN
1 1 1 hollow 2.0 ... 0 False NaN NaN
2 1 1 pupil 3.0 ... 0 False NaN NaN
4 1 2 fountain 1.0 ... 0 False NaN NaN
5 1 2 piano 2.0 ... 0 False NaN NaN
6 1 2 pillow 3.0 ... 0 False NaN NaN
[6 rows x 11 columns]
Alternatively, we may also want to get just the recall events, sorted by output position instead of input position:
In [10]: merged.query('recall').sort_values(['list', 'output'])
Out[10]:
subject list item input ... repeat intrusion prior_list prior_input
2 1 1 pupil 3.0 ... 0 False NaN NaN
0 1 1 absence 1.0 ... 0 False NaN NaN
3 1 1 empty NaN ... 0 True NaN NaN
6 1 2 pillow 3.0 ... 0 False NaN NaN
8 1 2 pupil NaN ... 0 True 1.0 3.0
7 1 2 pillow 3.0 ... 1 False NaN NaN
[6 rows x 11 columns]
Note that we first sort by list, then output position, to keep the lists together.
Recall performance¶
First, load some sample data and create a merged DataFrame:
In [1]: from psifr import fr
In [2]: df = fr.sample_data('Morton2013')
In [3]: data = fr.merge_free_recall(df)
Raster plot¶
Raster plots can give you a quick overview of a whole dataset. We’ll look at all of the first subject’s recalls. This will plot every individual recall, colored by the serial position of the recalled item in the list. Items near the end of the list are shown in yellow, and items near the beginning of the list are shown in purple. Intrusions of items not on the list are shown in red.
In [4]: subj = fr.filter_data(data, 1)
In [5]: g = fr.plot_raster(subj).add_legend()
Serial position curve¶
We can calculate average recall for each serial position
using spc()
and plot using plot_spc()
.
In [6]: recall = fr.spc(data)
In [7]: g = fr.plot_spc(recall)
Using the same plotting function, we can plot the curve for each individual subject:
In [8]: g = fr.plot_spc(recall, col='subject', col_wrap=5)
Probability of Nth recall¶
We can also split up recalls, to test for example how likely participants were to initiate recall with the last item on the list.
In [9]: prob = fr.pnr(data)
In [10]: prob
Out[10]:
prob actual possible
subject output input
1 1 1 0.000000 0 48
2 0.020833 1 48
3 0.000000 0 48
4 0.000000 0 48
5 0.000000 0 48
... ... ... ...
47 24 20 NaN 0 0
21 NaN 0 0
22 NaN 0 0
23 NaN 0 0
24 NaN 0 0
[23040 rows x 3 columns]
This gives us the probability of recall by output position ('output'
)
and serial or input position ('input'
). This is a lot to look at all
at once, so it may be useful to plot just the first three output positions.
We can plot the curves using plot_spc()
, which takes an
optional hue
input to specify a variable to use to split the data
into curves of different colors.
In [11]: pfr = prob.query('output <= 3')
In [12]: g = fr.plot_spc(pfr, hue='output').add_legend()
This plot shows what items tend to be recalled early in the recall sequence.
Prior-list intrusions¶
Participants will sometimes accidentally recall items from prior lists; these recalls are known as prior-list intrusions (PLIs). To better understand how prior-list intrusions are happening, you can look at how many lists back those items were originally presented.
First, you need to choose a maximum list lag that you will consider. This determines which lists will be included in the analysis. For example, if you have a maximum lag of 3, then the first 3 lists will be excluded from the analysis. This ensures that each included list can potentially have intrusions of each possible list lag.
In [13]: pli = fr.pli_list_lag(data, max_lag=3)
In [14]: pli
Out[14]:
count per_list prob
subject list_lag
1 1 7 0.155556 0.259259
2 5 0.111111 0.185185
3 0 0.000000 0.000000
2 1 9 0.200000 0.191489
2 2 0.044444 0.042553
... ... ... ...
46 2 1 0.022222 0.100000
3 0 0.000000 0.000000
47 1 5 0.111111 0.277778
2 1 0.022222 0.055556
3 0 0.000000 0.000000
[120 rows x 3 columns]
In [15]: pli.groupby('list_lag').agg(['mean', 'sem'])
Out[15]:
count per_list prob
mean sem mean sem mean sem
list_lag
1 5.55 0.547664 0.123333 0.012170 0.210631 0.014726
2 1.35 0.230801 0.030000 0.005129 0.043458 0.007032
3 0.75 0.174496 0.016667 0.003878 0.023385 0.005602
The analysis returns a raw count of intrusions at each lag (count
),
the count divided by the number of included lists (per_list
), and the
probability of a given intrusion coming from a given lag (prob
). In
the sample dataset, recently presented items (i.e., with lower list lag) are
more likely to be intruded.
Recall order¶
A key advantage of free recall is that it provides information not only about what items are recalled, but also the order in which they are recalled. A number of analyses have been developed to charactize different influences on recall order, such as the temporal order in which the items were presented at study, the category of the items themselves, or the semantic similarity between pairs of items.
Each conditional response probability (CRP) analysis involves calculating the probability of some type of transition event. For the lag-CRP analysis, transition events of interest are the different lags between serial positions of items recalled adjacent to one another. Similar analyses focus not on the serial position in which items are presented, but the properties of the items themselves. A semantic-CRP analysis calculates the probability of transitions between items in different semantic relatedness bins. A special case of this analysis is when item pairs are placed into one of two bins, depending on whether they are in the same stimulus category or not. In Psifr, this is referred to as a category-CRP analysis.
Lag-CRP¶
In all CRP analyses, transition probabilities are calculated conditional on a given transition being available. For example, in a six-item list, if the items 6, 1, and 4 have been recalled, then possible items that could have been recalled next are 2, 3, or 5; therefore, possible lags at that point in the recall sequence are -2, -1, or +1. The number of actual transitions observed for each lag is divided by the number of times that lag was possible, to obtain the CRP for each lag.
First, load some sample data and create a merged DataFrame:
In [1]: from psifr import fr
In [2]: df = fr.sample_data('Morton2013')
In [3]: data = fr.merge_free_recall(df, study_keys=['category'])
Next, call lag_crp()
to calculate conditional response
probability as a function of lag.
In [4]: crp = fr.lag_crp(data)
In [5]: crp
Out[5]:
prob actual possible
subject lag
1 -23.0 0.020833 1 48
-22.0 0.035714 3 84
-21.0 0.026316 3 114
-20.0 0.024000 3 125
-19.0 0.014388 2 139
... ... ... ...
47 19.0 0.061224 3 49
20.0 0.055556 2 36
21.0 0.045455 1 22
22.0 0.071429 1 14
23.0 0.000000 0 6
[1880 rows x 3 columns]
The results show the count of times a given transition actually happened
in the observed recall sequences (actual
) and the number of times a
transition could have occurred (possible
). Finally, the prob
column
gives the estimated probability of a given transition occurring, calculated
by dividing the actual count by the possible count.
Use plot_lag_crp()
to display the results:
In [6]: g = fr.plot_lag_crp(crp)
The peaks at small lags (e.g., +1 and -1) indicate that the recall sequences show evidence of a temporal contiguity effect; that is, items presented near to one another in the list are more likely to be recalled successively than items that are distant from one another in the list.
Lag rank¶
We can summarize the tendency to group together nearby items using a lag rank analysis. For each recall, this determines the absolute lag of all remaining items available for recall and then calculates their percentile rank. Then the rank of the actual transition made is taken, scaled to vary between 0 (furthest item chosen) and 1 (nearest item chosen). Chance clustering will be 0.5; clustering above that value is evidence of a temporal contiguity effect.
In [7]: ranks = fr.lag_rank(data)
In [8]: ranks
Out[8]:
rank
subject
1 0.610953
2 0.635676
3 0.612607
4 0.667090
5 0.643923
... ...
43 0.554024
44 0.561005
45 0.598151
46 0.652748
47 0.621245
[40 rows x 1 columns]
In [9]: ranks.agg(['mean', 'sem'])
Out[9]:
rank
mean 0.624699
sem 0.006732
Category CRP¶
If there are multiple categories or conditions of trials in a list, we can test whether participants tend to successively recall items from the same category. The category-CRP estimates the probability of successively recalling two items from the same category.
In [10]: cat_crp = fr.category_crp(data, category_key='category')
In [11]: cat_crp
Out[11]:
prob actual possible
subject
1 0.801147 419 523
2 0.733456 399 544
3 0.763158 377 494
4 0.814882 449 551
5 0.877273 579 660
... ... ... ...
43 0.809187 458 566
44 0.744376 364 489
45 0.763780 388 508
46 0.763573 436 571
47 0.806907 514 637
[40 rows x 3 columns]
In [12]: cat_crp[['prob']].agg(['mean', 'sem'])
Out[12]:
prob
mean 0.782693
sem 0.006262
The expected probability due to chance depends on the number of categories in the list. In this case, there are three categories, so a category CRP of 0.33 would be predicted if recalls were sampled randomly from the list.
Distance CRP¶
While the category CRP examines clustering based on semantic similarity at a coarse level (i.e., whether two items are in the same category or not), recall may also depend on more nuanced semantic relationships.
Models of semantic knowledge allow the semantic distance between pairs of items to be quantified. If you have such a model defined for your stimulus pool, you can use the distance CRP analysis to examine how semantic distance affects recall transitions.
You must first define distances between pairs of items. Here, we use correlation distances based on the wiki2USE model.
In [13]: items, distances = fr.sample_distances('Morton2013')
We also need a column indicating the index of each item in the
distances matrix. We use pool_index()
to create
a new column called item_index
with the index of each item in
the pool corresponding to the distances matrix.
In [14]: data['item_index'] = fr.pool_index(data['item'], items)
Finally, we must define distance bins. Here, we use 10 bins with
equally spaced distance percentiles. Note that, when calculating
distance percentiles, we use the squareform()
function to
get only the non-diagonal entries.
In [15]: from scipy.spatial.distance import squareform
In [16]: edges = np.percentile(squareform(distances), np.linspace(1, 99, 10))
We can now calculate conditional response probability as a function of distance bin, to examine how response probability varies with semantic distance.
In [17]: dist_crp = fr.distance_crp(data, 'item_index', distances, edges)
In [18]: dist_crp
Out[18]:
bin prob actual possible
subject center
1 0.467532 (0.352, 0.583] 0.085456 151 1767
0.617748 (0.583, 0.653] 0.067916 87 1281
0.673656 (0.653, 0.695] 0.062500 65 1040
0.711075 (0.695, 0.727] 0.051836 48 926
0.742069 (0.727, 0.757] 0.050633 44 869
... ... ... ... ...
47 0.742069 (0.727, 0.757] 0.062822 61 971
0.770867 (0.757, 0.785] 0.030682 27 880
0.800404 (0.785, 0.816] 0.040749 37 908
0.834473 (0.816, 0.853] 0.046651 39 836
0.897275 (0.853, 0.941] 0.028868 25 866
[360 rows x 4 columns]
Use plot_distance_crp()
to display the results:
In [19]: g = fr.plot_distance_crp(dist_crp).set(ylim=(0, 0.1))
Conditional response probability decreases with increasing semantic distance, suggesting that recall order was influenced by the semantic similarity between items. Of course, a complete analysis should address potential confounds such as the category structure of the list. See the Restricting analysis to specific items section for an example of restricting analysis based on category.
Distance rank¶
Similarly to the lag rank analysis of temporal clustering, we can summarize distance-based clustering (such as semantic clustering) with a single rank measure. The distance rank varies from 0 (the most-distant item is always recalled) to 1 (the closest item is always recalled), with chance clustering corresponding to 0.5.
In [20]: dist_rank = fr.distance_rank(data, 'item_index', distances)
In [21]: dist_rank.agg(['mean', 'sem'])
Out[21]:
rank
mean 0.625932
sem 0.003466
Restricting analysis to specific items¶
Sometimes you may want to focus an analysis on a subset of recalls. For example, in order to exclude the period of high clustering commonly observed at the start of recall, lag-CRP analyses are sometimes restricted to transitions after the first three output positions.
You can restrict the recalls included in a transition analysis using
the optional item_query
argument. This is built on the Pandas
query/eval system, which makes it possible to select rows of a
DataFrame
using a query string. This string can refer to any
column in the data. Any items for which the expression evaluates to
True
will be included in the analysis.
For example, we can use the item_query
argument to exclude any
items recalled in the first three output positions from analysis. Note
that, because non-recalled items have no output position, we need to
include them explicitly using output > 3 or not recall
.
In [22]: crp_op3 = fr.lag_crp(data, item_query='output > 3 or not recall')
In [23]: g = fr.plot_lag_crp(crp_op3)
Restricting analysis to specific transitions¶
In other cases, you may want to focus an analysis on a subset of transitions based on some criteria. For example, if a list contains items from different categories, it is a good idea to take this into account when measuring temporal clustering using a lag-CRP analysis. One approach is to separately analyze within- and across-category transitions.
Transitions can be selected for inclusion using the optional
test_key
and test
inputs. The test_key
indicates a column of the data to use for testing transitions; for
example, here we will use the category
column. The
test
input should be a function that takes in the test value
of the previous recall and the current recall and returns True or False
to indicate whether that transition should be included. Here, we will
use a lambda (anonymous) function to define the test.
In [24]: crp_within = fr.lag_crp(data, test_key='category', test=lambda x, y: x == y)
In [25]: crp_across = fr.lag_crp(data, test_key='category', test=lambda x, y: x != y)
In [26]: crp_combined = pd.concat([crp_within, crp_across], keys=['within', 'across'], axis=0)
In [27]: crp_combined.index.set_names('transition', level=0, inplace=True)
In [28]: g = fr.plot_lag_crp(crp_combined, hue='transition').add_legend()
The within
curve shows the lag-CRP for transitions between
items of the same category, while the across
curve shows
transitions between items of different categories.
Comparing conditions¶
When analyzing a dataset, it’s often important to compare different experimental conditions. Psifr is built on the Pandas DataFrame, which has powerful ways of splitting data and applying operations to it. This makes it possible to analyze and plot different conditions using very little code.
Working with custom columns¶
First, load some sample data and create a merged DataFrame:
In [1]: from psifr import fr
In [2]: df = fr.sample_data('Morton2013')
In [3]: data = fr.merge_free_recall(
...: df, study_keys=['category'], list_keys=['list_type']
...: )
...:
In [4]: data.head()
Out[4]:
subject list item input ... list_type category prior_list prior_input
0 1 1 TOWEL 1.0 ... pure obj NaN NaN
1 1 1 LADLE 2.0 ... pure obj NaN NaN
2 1 1 THERMOS 3.0 ... pure obj NaN NaN
3 1 1 LEGO 4.0 ... pure obj NaN NaN
4 1 1 BACKPACK 5.0 ... pure obj NaN NaN
[5 rows x 13 columns]
The merge_free_recall()
function only includes columns from the
raw data if they are one of the standard columns or if they’ve explictly been
included using study_keys
, recall_keys
, or list_keys
.
list_keys
apply to all events in a list, while study_keys
and
recall_keys
are relevant only for study and recall events, respectively.
We’ve included a list key here, to indicate that the list_type
field should be included for all study and recall events in each list, even
intrusions. The category
field will be included for all study events
and all valid recalls. Intrusions will have an undefined category.
Analysis by condition¶
Now we can run any analysis separately for the different conditions. We’ll use the serial position curve analysis as an example.
In [5]: spc = data.groupby('list_type').apply(fr.spc)
In [6]: spc.head()
Out[6]:
recall
list_type subject input
mixed 1 1.0 0.500000
2.0 0.466667
3.0 0.600000
4.0 0.300000
5.0 0.333333
The spc
DataFrame has separate groups with the results for each
list_type
.
Warning
When using groupby
with order-based analyses like
lag_crp()
, make sure all recalls in all recall
sequences for a given list have the same label. Otherwise, you will
be breaking up recall sequences, which could result in an invalid
analysis.
Plotting by condition¶
We can then plot a separate curve for each condition. All plotting functions
take optional hue
, col
, col_wrap
, and row
inputs that can be used to divide up data when plotting. See the
Seaborn documentation
for details. Most inputs to seaborn.relplot()
are supported.
For example, we can plot two curves for the different list types:
In [7]: g = fr.plot_spc(spc, hue='list_type').add_legend()
We can also plot the curves in different axes using the col
option:
In [8]: g = fr.plot_spc(spc, col='list_type')
We can also plot all combinations of two conditions:
In [9]: spc_split = data.groupby(['list_type', 'category']).apply(fr.spc)
In [10]: g = fr.plot_spc(spc_split, col='list_type', row='category')
Plotting by subject¶
All analyses can be plotted separately by subject. A nice way to do this is
using the col
and col_wrap
optional inputs, to make a grid
of plots with 6 columns per row:
In [11]: g = fr.plot_spc(
....: spc, hue='list_type', col='subject', col_wrap=6, height=2
....: ).add_legend()
....:
Tutorials¶
See the psifr-notebooks project for a set of Jupyter notebooks with sample code. These examples go more in depth into the options available for each analysis and how they can be used for advanced analyses such as conditionalizing CRP analysis on specific transitions.
API reference¶
Free recall analysis¶
Managing data¶
|
Create table format data from list format data. |
|
Run checks on free recall data. |
|
Score free recall data by matching up study and recall events. |
|
Merge study and recall events together for each list. |
|
Filter data to get a subset of trials. |
|
Reset list index in a DataFrame. |
|
Convert free recall data from one phase to split format. |
|
Get the index of each item in the full pool. |
|
Get index of each block in a list. |
Recall probability¶
|
Serial position curve. |
|
Probability of recall by serial position and output position. |
Intrusions¶
|
List lag of prior-list intrusions. |
Transition probability¶
|
Lag-CRP for multiple subjects. |
|
Conditional response probability of within-category transitions. |
|
Conditional response probability by distance bin. |
Transition rank¶
|
Calculate rank of the absolute lags in free recall lists. |
|
Calculate rank of transition distances in free recall lists. |
Plotting¶
|
Plot recalls in a raster plot. |
|
Plot a serial position curve. |
|
Plot conditional response probability by lag. |
|
Plot response probability by distance bin. |
|
Plot points as a swarm plus mean with error bars. |
Measures¶
Transition measure base class¶
|
Measure of free recall dataset with multiple subjects. |
|
Get relevant fields and split by list. |
Analyze a free recall dataset with multiple subjects. |
|
|
Analyze a single subject. |
Transition measures¶
|
Measure recall probability by input and output position. |
|
Measure conditional response probability by lag. |
|
Measure lag rank of transitions. |
|
Measure conditional response probability by category transition. |
|
Measure conditional response probability by distance. |
|
Measure transition rank by distance. |
Transitions¶
Counting transitions¶
|
Count actual and possible serial position lags. |
|
Count within-category transitions. |
|
Count transitions within distance bins. |
Ranking transitions¶
|
Get percentile rank of a score compared to possible scores. |
|
Calculate rank of absolute lag for free recall lists. |
|
Calculate percentile rank of transition distances. |
Iterating over transitions¶
|
Iterate over transitions with masking. |
Outputs¶
Counting recalls by serial position and output position¶
|
Count actual and possible recalls for each output position. |
Iterating over output positions¶
|
Iterate over valid outputs. |
Development¶
Transitions¶
Psifr has a core set of tools for analyzing transitions in free recall data. These tools focus on measuring what transitions actually occurred, and which transitions were possible given the order in which participants recalled items.
Actual and possible transitions¶
Calculating a conditional response probability involves two parts: the frequency at which a given event actually occurred in the data and frequency at which a given event could have occurred. The frequency of possible events is calculated conditional on the recalls that have been made leading up to each transition. For example, a transition between item \(i\) and item \(j\) is not considered “possible” in a CRP analysis if item \(i\) was never recalled. The transition is also not considered “possible” if, when item \(i\) is recalled, item \(j\) has already been recalled previously.
Repeated recall events are typically excluded from the counts of both actual and possible transition events. That is, the transition event frequencies are conditional on the transition not being either to or from a repeated item.
Calculating a CRP measure involves tallying how many transitions of a given type were made during a free recall test. For example, one common measure is the serial position lag between items. For a list of length \(N\), possible lags are in the range \([-N+1, N-1]\). Because repeats are excluded, a lag of zero is never possible. The count of actual and possible transitions for each lag is calculated first, and then the CRP for each lag is calculated as the actual count divided by the possible count.
The transitions masker¶
The psifr.transitions.transitions_masker()
is a generator that makes
it simple to iterate over transitions while “masking” out events such as
intrusions of items not on the list and repeats of items that have already
been recalled.
On each step of the iterator, the previous, current, and possible items are yielded. The previous item is the item being transitioned from. The current item is the item being transitioned to. The possible items includes an array of all items that were valid to be recalled next, given the recall sequence up to that point (not including the current item).
In [1]: from psifr.transitions import transitions_masker
In [2]: pool = [1, 2, 3, 4, 5, 6]
In [3]: recs = [6, 2, 3, 6, 1, 4]
In [4]: masker = transitions_masker(pool_items=pool, recall_items=recs,
...: pool_output=pool, recall_output=recs)
...:
In [5]: for prev, curr, poss in masker:
...: print(prev, curr, poss)
...:
6 2 [1 2 3 4 5]
2 3 [1 3 4 5]
1 4 [4 5]
Only valid transitions are yielded, so the code for a specific analysis only needs to calculate the transition measure of interest and count the number of actual and possible transitions in each bin of interest.
Four inputs are required:
- pool_items
List of identifiers for all items available for recall. Identifiers can be anything that is unique to each item in the list (e.g., serial position, a string representation of the item, an index in the stimulus pool).
- recall_items
List of identifiers for the sequence of recalls, in order. Valid recalls must match an item in pool_items. Other items are considered intrusions.
- pool_output
Output codes for each item in the pool. This should be whatever you need to calculate your transition measure.
- recall_output
Output codes for each recall in the sequence of recalls.
By using different values for these four inputs and defining different transition measures, a wide range of analyses can be implemented.