Comparing conditions#

When analyzing a dataset, it’s often important to compare different experimental conditions. Psifr is built on the Pandas DataFrame, which has powerful ways of splitting data and applying operations to it. This makes it possible to analyze and plot different conditions using very little code.

Working with custom columns#

First, load some sample data and create a merged DataFrame:

In [1]: from psifr import fr

In [2]: df = fr.sample_data('Morton2013')

In [3]: data = fr.merge_free_recall(
   ...:     df, study_keys=['category'], list_keys=['list_type']
   ...: )
   ...: 

In [4]: data.head()
Out[4]: 
    subject  list      item  input  ...  list_type  category  prior_list  prior_input
23        1     1     TOWEL    1.0  ...       pure       obj         NaN          NaN
8         1     1     LADLE    2.0  ...       pure       obj         NaN          NaN
22        1     1   THERMOS    3.0  ...       pure       obj         NaN          NaN
10        1     1      LEGO    4.0  ...       pure       obj         NaN          NaN
0         1     1  BACKPACK    5.0  ...       pure       obj         NaN          NaN

[5 rows x 13 columns]

The merge_free_recall() function only includes columns from the raw data if they are one of the standard columns or if they’ve explictly been included using study_keys, recall_keys, or list_keys. list_keys apply to all events in a list, while study_keys and recall_keys are relevant only for study and recall events, respectively.

We’ve included a list key here, to indicate that the list_type field should be included for all study and recall events in each list, even intrusions. The category field will be included for all study events and all valid recalls. Intrusions will have an undefined category.

Analysis by condition#

Now we can run any analysis separately for the different conditions. We’ll use the serial position curve analysis as an example.

In [5]: spc = data.set_index('list_type').groupby('list_type').apply(fr.spc)

In [6]: spc.head()
Out[6]: 
             subject  input    recall
list_type                            
mixed     0        1    1.0  0.500000
          1        1    2.0  0.466667
          2        1    3.0  0.600000
          3        1    4.0  0.300000
          4        1    5.0  0.333333

The call to set_index before groupby avoids a deprecation warning (Pandas 2.2 changed the behavior of apply after groupby).

The spc DataFrame has separate groups with the results for each list_type.

Warning

When using groupby with order-based analyses like lag_crp(), make sure all recalls in all recall sequences for a given list have the same label. Otherwise, you will be breaking up recall sequences, which could result in an invalid analysis.

Plotting by condition#

We can then plot a separate curve for each condition. All plotting functions take optional hue, col, col_wrap, and row inputs that can be used to divide up data when plotting. Most inputs to seaborn.relplot() are supported.

For example, we can plot two curves for the different list types:

In [7]: g = fr.plot_spc(spc, hue='list_type').add_legend()

We can also plot the curves in different axes using the col option:

In [8]: g = fr.plot_spc(spc, col='list_type')

We can also plot all combinations of two conditions:

In [9]: spc_split = data.set_index(['list_type', 'category']).groupby(['list_type', 'category']).apply(fr.spc)

In [10]: g = fr.plot_spc(spc_split, col='list_type', row='category')

Plotting by subject#

All analyses can be plotted separately by subject. A nice way to do this is using the col and col_wrap optional inputs, to make a grid of plots with 6 columns per row:

In [11]: g = fr.plot_spc(
   ....:     spc, hue='list_type', col='subject', col_wrap=6, height=2
   ....: ).add_legend()
   ....: