Psifr documentation

In free recall, participants study a list of items and then name all of the items they can remember in any order they choose. Many sophisticated analyses have been developed to analyze data from free recall experiments, but these analyses are often complicated and difficult to implement.

Psifr leverages the Pandas data analysis package to make precise and flexible analysis of free recall data faster and easier.

Installation

First get a copy of the code from GitHub:

git clone git@github.com:mortonne/psifr.git

Then install:

cd psifr
python setup.py install

User guide

Importing data

In Psifr, free recall data are imported in the form of a “long” format table. Each row corresponds to one study or recall event. Study events include any time an item was presented to the participant. Recall events correspond to any recall attempt; this includes repeats of items there were already recalled and intrusions of items that were not present in the study list.

This type of information is well represented in a CSV spreadsheet, though any file format supported by pandas may be used for input. To import from a CSV, use pandas. For example:

import pandas as pd
data = pd.read_csv("my_data.csv")

Trial information

The basic information that must be included for each event is the following:

subject

Some code (numeric or string) indicating individual participants. Must be unique for a given experiment. For example, sub-101.

list

Numeric code indicating individual lists. Must be unique within subject.

trial_type

String indicating whether each event is a study event or a recall event.

position

Integer indicating position within a given phase of the list. For study events, this corresponds to input position (also referred to as serial position). For recall events, this corresponds to output position.

item

Individual thing being recalled, such as a word. May be specified with text (e.g., pumpkin, Jack Nicholson) or a numeric code (682, 121). Either way, the text or number must be unique to that item. Text is easier to read and does not require any additional information for interpretation and is therefore preferred if available.

Example

Sample data

subject

list

trial_type

position

item

1

1

study

1

absence

1

1

study

2

hollow

1

1

study

3

pupil

1

1

recall

1

pupil

1

1

recall

2

absence

Additional information

Additional fields may be included in the data to indicate other aspects of the experiment, such as presentation time, stimulus category, experimental session, distraction length, etc. All of these fields can then be used for analysis in Psifr.

Scoring data

After importing free recall data, we have a DataFrame with a row for each study event and a row for each recall event. Next, we need to score the data by matching study events with recall events.

Scoring list recall

First, let’s create a simple sample dataset with two lists:

In [1]: import pandas as pd

In [2]: data = pd.DataFrame(
   ...:     {'subject': [1, 1, 1, 1, 1, 1,
   ...:                  1, 1, 1, 1, 1, 1],
   ...:      'list': [1, 1, 1, 1, 1, 1,
   ...:               2, 2, 2, 2, 2, 2],
   ...:      'trial_type': ['study', 'study', 'study',
   ...:                     'recall', 'recall', 'recall',
   ...:                     'study', 'study', 'study',
   ...:                     'recall', 'recall', 'recall'],
   ...:      'position': [1, 2, 3, 1, 2, 3,
   ...:                   1, 2, 3, 1, 2, 3],
   ...:      'item': ['absence', 'hollow', 'pupil',
   ...:               'pupil', 'absence', 'empty',
   ...:               'fountain', 'piano', 'pillow',
   ...:               'pillow', 'fountain', 'pillow']})
   ...: 

In [3]: data
Out[3]: 
    subject  list trial_type  position      item
0         1     1      study         1   absence
1         1     1      study         2    hollow
2         1     1      study         3     pupil
3         1     1     recall         1     pupil
4         1     1     recall         2   absence
5         1     1     recall         3     empty
6         1     2      study         1  fountain
7         1     2      study         2     piano
8         1     2      study         3    pillow
9         1     2     recall         1    pillow
10        1     2     recall         2  fountain
11        1     2     recall         3    pillow

Next, we’ll merge together the study and recall events by matching up corresponding events:

In [4]: from psifr import fr

In [5]: study = data.query('trial_type == "study"').copy()

In [6]: recall = data.query('trial_type == "recall"').copy()

In [7]: merged = fr.merge_lists(study, recall)

In [8]: merged
Out[8]: 
   subject  list      item  input  output  recalled  repeat  intrusion
0        1     1   absence    1.0     2.0      True       0      False
1        1     1    hollow    2.0     NaN     False       0      False
2        1     1     pupil    3.0     1.0      True       0      False
3        1     1     empty    NaN     3.0      True       0       True
4        1     2  fountain    1.0     2.0      True       0      False
5        1     2     piano    2.0     NaN     False       0      False
6        1     2    pillow    3.0     1.0      True       0      False
7        1     2    pillow    3.0     3.0      True       1      False

For each item, there is one row for each unique combination of input and output position. For example, if an item is presented once in the list, but is recalled multiple times, there is one row for each of the recall attempts. Repeated recalls are indicated by the repeat column, which is greater than zero for recalls of an item after the first.

Items that were not recalled have the recalled column set to False. Because they were not recalled, they have no defined output position, so output is set to NaN. Finally, intrusions have an output position but no input position because they did not appear in the list. There is an intrusion field for convenience to label these recall attempts.

Conditional response probability

A key advantage of free recall is that it provides information not only about what items are recalled, but also the order in which they are recalled. A number of analyses have been developed to charactize different influences on recall order, such as the temporal order in which the items were presented at study, the category of the items themselves, or the semantic similarity between pairs of items.

Each conditional response probability (CRP) analysis involves calculating the probability of some type of transition event. For the lag-CRP analysis, transition events of interest are the different lags between serial positions of items recalled adjacent to one another. Similar analyses focus not on the serial position in which items are presented, but the properties of the items themselves. A semantic-CRP analysis calculates the probability of transitions between items in different semantic relatedness bins. A special case of this analysis is when item pairs are placed into one of two bins, depending on whether they are in the same stimulus category or not. In Psifr, this is referred to as a category-CRP analysis.

Actual and possible transitions

Calculating a conditional response probability involves two parts: the frequency at which a given event actually occurred in the data and frequency at which a given event could have occurred. The frequency of possible events is calculated conditional on the recalls that have been made leading up to each transition. For example, a transition between item \(i\) and item \(j\) is not considered “possible” in a CRP analysis if item \(i\) was never recalled. The transition is also not considered “possible” if, when item \(i\) is recalled, item \(j\) has already been recalled previously.

Repeated recall events are typically excluded from the counts of both actual and possible transition events. That is, the transition event frequencies are conditional on the transition not being either to or from a repeated item.

Calculating a CRP measure involves tallying how many transitions of a given type were made during a free recall test. For example, one common measure is the serial position lag between items. For a list of length \(N\), possible lags are in the range \([-N+1, N-1]\). Because repeats are excluded, a lag of zero is never possible. The count of actual and possible transitions for each lag is calculated first, and then the CRP for each lag is calculated as the actual count divided by the possible count.

The transitions masker

The psifr.transitions.transitions_masker() is a generator that makes it simple to iterate over transitions while “masking” out events such as intrusions of items not on the list and repeats of items that have already been recalled.

On each step of the iterator, the previous, current, and possible items are yielded. The previous item is the item being transitioned from. The current item is the item being transitioned to. The possible items includes an array of all items that were valid to be recalled next, given the recall sequence up to that point (not including the current item).

In [1]: from psifr.transitions import transitions_masker

In [2]: pool = [1, 2, 3, 4, 5, 6]

In [3]: recs = [6, 2, 3, 6, 1, 4]

In [4]: masker = transitions_masker(pool_items=pool, recall_items=recs,
   ...:                             pool_output=pool, recall_output=recs)
   ...: 

In [5]: for prev, curr, poss in masker:
   ...:     print(prev, curr, poss)
   ...: 
6 2 [1 2 3 4 5]
2 3 [1 3 4 5]
1 4 [4 5]

Only valid transitions are yielded, so the code for a specific analysis only needs to calculate the transition measure of interest and count the number of actual and possible transitions in each bin of interest.

Four inputs are required:

pool_items

List of identifiers for all items available for recall. Identifiers can be anything that is unique to each item in the list (e.g., serial position, a string representation of the item, an index in the stimulus pool).

recall_items

List of identifiers for the sequence of recalls, in order. Valid recalls must match an item in pool_items. Other items are considered intrusions.

pool_output

Output codes for each item in the pool. This should be whatever you need to calculate your transition measure.

recall_output

Output codes for each recall in the sequence of recalls.

By using different values for these four inputs and defining different transition measures, a wide range of analyses can be implemented.

Tutorials

See the psifr-notebooks project for sample code.

API Reference

Transitions

The transitions module contains utilties to iterate over and mask transitions between recalled items. The psifr.transitions.transitions_masker() does most of the work here.

Module to analyze transitions during free recall.

psifr.transitions.count_category(pool_items, recall_items, pool_category, recall_category, pool_test=None, recall_test=None, test=None)

Count within-category transitions.

psifr.transitions.count_lags(pool_items, recall_items, pool_test=None, recall_test=None, test=None)

Count actual and possible serial position lags.

Parameters
  • pool_items (list) – List of the serial positions available for recall in each list. Must match the serial position codes used in recall_items.

  • recall_items (list) – List indicating the serial position of each recall in output order (NaN for intrusions).

  • pool_test (list, optional) – List of some test value for each item in the pool.

  • recall_test (list, optional) – List of some test value for each recall attempt by output position.

  • test (callable) – Callable that evaluates each transition between items n and n+1. Must take test values for items n and n+1 and return True if a given transition should be included.

psifr.transitions.count_pairs(n_item, pool_items, recall_items, pool_test=None, recall_test=None, test=None)

Count transitions between pairs of specific items.

psifr.transitions.transitions_masker(pool_items, recall_items, pool_output, recall_output, pool_test=None, recall_test=None, test=None)

Iterate over transitions with masking.

Transitions are between a “previous” item and a “current” item. Non-included transitions will be skipped. A transition is yielded only if it matches the following conditions:

(1) Each item involved in the transition is in the pool. Items are removed from the pool after they appear as the previous item.

(2) Optionally, an additional check is run based on test values associated with the items in the transition. For example, this could be used to only include transitions where the category of the previous and current items is the same.

The masker will yield “output” values, which may be distinct from the item identifiers used to determine item repeats.

Parameters
  • pool_items (list) – Items available for recall. Order does not matter. May contain repeated values. Item identifiers must be unique within pool.

  • recall_items (list) – Recalled items in output position order.

  • pool_output (list) – Output values for pool items. Must be the same order as pool.

  • recall_output (list) – Output values in output position order.

  • pool_test (list, optional) – Test values for items available for recall. Must be the same order as pool.

  • recall_test (list, optional) – Test values for items in output position order.

  • test (callable, optional) –

    Used to test whether individual transitions should be included, based on test values.

    test(prev, curr) - test for included transition

    test(prev, poss) - test for included possible transition

Yields
  • prev (object) – Output value for the “from” item on this transition.

  • curr (object) – Output value for the “to” item.

  • poss (numpy.array) – Output values for all possible valid “to” items.

Free Recall Analysis

Utilities for working with free recall data.

psifr.fr.block_index(list_labels)

Get index of each block in a list.

psifr.fr.check_data(df)

Run checks on free recall data.

Parameters

df (pandas.DataFrame) –

Contains one row for each trial (study and recall). Must have fields:
subjectnumber or str

Subject identifier.

listnumber

List identifier. This applies to both study and recall trials.

trial_typestr

Type of trial; may be ‘study’ or ‘recall’.

positionnumber

Position within the study list or recall sequence.

itemstr

Item that was either presented or recalled on this trial.

psifr.fr.get_recall_index(df, list_cols=None)

Get recall input position index by list.

psifr.fr.get_study_value(df, column, list_cols=None)

Get study column value by list.

psifr.fr.lag_crp(df, test_values=None, test=None, first_output=None)

Lag-CRP for multiple subjects.

Parameters
  • df (pandas.DataFrame) – Merged study and recall data. See merge_lists. List length is assumed to be the same for all lists within each subject. Must have fields: subject, list, input, output, recalled. Input position must be defined such that the first serial position is 1, not 0.

  • test_values (pandas.Series or column name, optional) – Column with labels to use when testing transitions for inclusion.

  • test (callable, optional) – Callable that takes in previous and current item values and returns True for transitions that should be included.

  • first_output (int, optional) – First output position to include when calculating transition probabilities. Used to exclude initial outputs. Default is to start at the first recall on each list.

Returns

results – Has fields:

subjecthashable

Results are separated by each subject.

lagint

Lag of input position between two adjacent recalls.

probfloat

Probability of each lag transition.

actualint

Total of actual made transitions at each lag.

possibleint

Total of times each lag was possible, given the prior input position and the remaining items to be recalled.

Return type

pandas.DataFrame

psifr.fr.merge_lists(study, recall, merge_keys=None, list_keys=None, study_keys=None, recall_keys=None, position_key='position')

Merge study and recall events together for each list.

Parameters
  • study (pandas.DataFrame) – Information about all study events. Should have one row for each study event.

  • recall (pandas.DataFrame) – Information about all recall events. Should have one row for each recall attempt.

  • merge_keys (list, optional) – Columns to use to designate events to merge. Default is [‘subject’, ‘list’, ‘item’], which will merge events related to the same item, but only within list.

  • list_keys (list, optional) – Columns that apply to both study and recall events.

  • study_keys (list, optional) – Columns that only apply to study events.

  • recall_keys (list, optional) – Columns that only apply to recall events.

  • position_key (str, optional) – Column indicating the position of each item in either the study list or the recall sequence.

Returns

merged – Merged information about study and recall events. Each row corresponds to one unique input/output pair.

The following columns will be added:

inputint

Position of each item in the input list (i.e., serial position).

outputint

Position of each item in the recall sequence.

recalledbool

True for rows with an associated recall event.

repeatint

Number of times this recall event has been repeated (0 for the first recall of an item).

intrusionbool

True for recalls that do not correspond to any study event.

Return type

pandas.DataFrame