Scoring data#
After importing free recall data, we have a DataFrame with a row for each study event and a row for each recall event. Next, we need to score the data by matching study events with recall events.
Scoring list recall#
First, let’s create a simple sample dataset with two lists. We can use
the table_from_lists()
convenience function to create
a sample dataset with a given set of study lists and recalls:
In [1]: from psifr import fr
In [2]: list_subject = [1, 1]
In [3]: study_lists = [['absence', 'hollow', 'pupil'], ['fountain', 'piano', 'pillow']]
In [4]: recall_lists = [['pupil', 'absence', 'empty'], ['pillow', 'pupil', 'pillow']]
In [5]: data = fr.table_from_lists(list_subject, study_lists, recall_lists)
In [6]: data
Out[6]:
subject list trial_type position item
0 1 1 study 1 absence
1 1 1 study 2 hollow
2 1 1 study 3 pupil
3 1 1 recall 1 pupil
4 1 1 recall 2 absence
5 1 1 recall 3 empty
6 1 2 study 1 fountain
7 1 2 study 2 piano
8 1 2 study 3 pillow
9 1 2 recall 1 pillow
10 1 2 recall 2 pupil
11 1 2 recall 3 pillow
Next, we’ll merge together the study and recall events by matching up
corresponding events using merge_free_recall()
.
This scoring and merging step labels recall attempts
in terms of whether they were correct recalls, repeats, or intrusions. At the
same time, it also labels study events in terms of whether they were correctly
recalled, and, if so, at which output position they were recalled. Free-recall
analyses in Psifr are computed from data in this “merged” format.
In [7]: merged = fr.merge_free_recall(data)
In [8]: merged
Out[8]:
subject list item input ... repeat intrusion prior_list prior_input
0 1 1 absence 1.0 ... 0 False NaN NaN
1 1 1 hollow 2.0 ... 0 False NaN NaN
2 1 1 pupil 3.0 ... 0 False NaN NaN
3 1 1 empty NaN ... 0 True NaN NaN
4 1 2 fountain 1.0 ... 0 False NaN NaN
5 1 2 piano 2.0 ... 0 False NaN NaN
6 1 2 pillow 3.0 ... 0 False NaN NaN
7 1 2 pillow 3.0 ... 1 False NaN NaN
8 1 2 pupil NaN ... 0 True 1.0 3.0
[9 rows x 11 columns]
For each item, there is one row for each unique combination of input and
output position. For example, if an item is presented once in the list, but
is recalled multiple times, there is one row for each of the recall attempts.
Repeated recalls are indicated by the repeat
column, which is greater than
zero for recalls of an item after the first. Unique study events are indicated
by the study
column; this excludes intrusions and repeated recalls.
Items that were not recalled have the recall
column set to False
. Because
they were not recalled, they have no defined output position, so output
is
set to NaN
. Finally, intrusions have an output position but no input position
because they did not appear in the list. There is an intrusion
field for
convenience to label these recall attempts. The prior_list
and prior_input
fields give information about prior-list intrusions (PLIs) of items from prior
lists. The prior_list
field gives the list where the item appeared and
prior_input
indicates the position in which is was presented on that list.
merge_free_recall()
can also handle additional attributes beyond
the standard ones, such as codes indicating stimulus category or list condition.
See Working with custom columns for details.
Filtering and sorting#
Now that we have a merged DataFrame
, we can use Pandas methods to quickly
get different views of the data. For some analyses, we may want to organize in
terms of the study list by removing repeats and intrusions. Because our data
are in a DataFrame
, we can use the query()
method:
In [9]: merged.query('study')
Out[9]:
subject list item input ... repeat intrusion prior_list prior_input
0 1 1 absence 1.0 ... 0 False NaN NaN
1 1 1 hollow 2.0 ... 0 False NaN NaN
2 1 1 pupil 3.0 ... 0 False NaN NaN
4 1 2 fountain 1.0 ... 0 False NaN NaN
5 1 2 piano 2.0 ... 0 False NaN NaN
6 1 2 pillow 3.0 ... 0 False NaN NaN
[6 rows x 11 columns]
Alternatively, we may also want to get just the recall events, sorted by output position instead of input position:
In [10]: merged.query('recall').sort_values(['list', 'output'])
Out[10]:
subject list item input ... repeat intrusion prior_list prior_input
2 1 1 pupil 3.0 ... 0 False NaN NaN
0 1 1 absence 1.0 ... 0 False NaN NaN
3 1 1 empty NaN ... 0 True NaN NaN
6 1 2 pillow 3.0 ... 0 False NaN NaN
8 1 2 pupil NaN ... 0 True 1.0 3.0
7 1 2 pillow 3.0 ... 1 False NaN NaN
[6 rows x 11 columns]
Note that we first sort by list, then output position, to keep the lists together.
In addition to using the query()
method directly,
we can also use filter_data()
to get subsets of data. For
example, to get the first list only:
In [11]: fr.filter_data(merged, lists=1)
Out[11]:
subject list item input ... repeat intrusion prior_list prior_input
0 1 1 absence 1.0 ... 0 False NaN NaN
1 1 1 hollow 2.0 ... 0 False NaN NaN
2 1 1 pupil 3.0 ... 0 False NaN NaN
3 1 1 empty NaN ... 0 True NaN NaN
[4 rows x 11 columns]