Scoring data¶
After importing free recall data, we have a DataFrame with a row for each study event and a row for each recall event. Next, we need to score the data by matching study events with recall events.
Scoring list recall¶
First, let’s create a simple sample dataset with two lists:
In [1]: import pandas as pd
In [2]: data = pd.DataFrame({
...: 'subject': [
...: 1, 1, 1, 1, 1, 1,
...: 1, 1, 1, 1, 1, 1,
...: ],
...: 'list': [
...: 1, 1, 1, 1, 1, 1,
...: 2, 2, 2, 2, 2, 2,
...: ],
...: 'trial_type': [
...: 'study', 'study', 'study', 'recall', 'recall', 'recall',
...: 'study', 'study', 'study', 'recall', 'recall', 'recall',
...: ],
...: 'position': [
...: 1, 2, 3, 1, 2, 3,
...: 1, 2, 3, 1, 2, 3,
...: ],
...: 'item': [
...: 'absence', 'hollow', 'pupil', 'pupil', 'absence', 'empty',
...: 'fountain', 'piano', 'pillow', 'pillow', 'fountain', 'pillow',
...: ],
...: })
...:
In [3]: data
Out[3]:
subject list trial_type position item
0 1 1 study 1 absence
1 1 1 study 2 hollow
2 1 1 study 3 pupil
3 1 1 recall 1 pupil
4 1 1 recall 2 absence
5 1 1 recall 3 empty
6 1 2 study 1 fountain
7 1 2 study 2 piano
8 1 2 study 3 pillow
9 1 2 recall 1 pillow
10 1 2 recall 2 fountain
11 1 2 recall 3 pillow
Next, we’ll merge together the study and recall events by matching up corresponding events:
In [4]: from psifr import fr
In [5]: merged = fr.merge_free_recall(data)
In [6]: merged
Out[6]:
subject list item input output study recall repeat intrusion
0 1 1 absence 1.0 2.0 True True 0 False
1 1 1 hollow 2.0 NaN True False 0 False
2 1 1 pupil 3.0 1.0 True True 0 False
3 1 1 empty NaN 3.0 False True 0 True
4 1 2 fountain 1.0 2.0 True True 0 False
5 1 2 piano 2.0 NaN True False 0 False
6 1 2 pillow 3.0 1.0 True True 0 False
7 1 2 pillow 3.0 3.0 False True 1 False
For each item, there is one row for each unique combination of input and output position. For example, if an item is presented once in the list, but is recalled multiple times, there is one row for each of the recall attempts. Repeated recalls are indicated by the repeat column, which is greater than zero for recalls of an item after the first. Unique study events are indicated by the study column; this excludes intrusions and repeated recalls.
Items that were not recalled have the recall column set to False. Because they were not recalled, they have no defined output position, so output is set to NaN. Finally, intrusions have an output position but no input position because they did not appear in the list. There is an intrusion field for convenience to label these recall attempts.
merge_free_recall()
can also handle additional attributes beyond
the standard ones, such as codes indicating stimulus category or list condition.
See Working with custom columns for details.
Filtering and sorting¶
Now that we have a merged DataFrame, we can use pandas methods to quickly get different views of the data. For some analyses, we may want to organize in terms of the study list by removing repeats and intrusions. Because our data are in a DataFrame, we can use the DataFrame.query method:
In [7]: merged.query('study')
Out[7]:
subject list item input output study recall repeat intrusion
0 1 1 absence 1.0 2.0 True True 0 False
1 1 1 hollow 2.0 NaN True False 0 False
2 1 1 pupil 3.0 1.0 True True 0 False
4 1 2 fountain 1.0 2.0 True True 0 False
5 1 2 piano 2.0 NaN True False 0 False
6 1 2 pillow 3.0 1.0 True True 0 False
Alternatively, we may also want to get just the recall events, sorted by output position instead of input position:
In [8]: merged.query('recall').sort_values(['list', 'output'])
Out[8]:
subject list item input output study recall repeat intrusion
2 1 1 pupil 3.0 1.0 True True 0 False
0 1 1 absence 1.0 2.0 True True 0 False
3 1 1 empty NaN 3.0 False True 0 True
6 1 2 pillow 3.0 1.0 True True 0 False
4 1 2 fountain 1.0 2.0 True True 0 False
7 1 2 pillow 3.0 3.0 False True 1 False
Note that we first sort by list, then output position, to keep the lists together.