Comparing conditions

When analyzing a dataset, it’s often important to compare different experimental conditions. Psifr is built on the Pandas DataFrame, which has powerful ways of splitting data and applying operations to it. This makes it possible to analyze and plot different conditions using very little code.

Working with custom columns

First, load some sample data and create a merged DataFrame:

In [1]: from psifr import fr

In [2]: df = fr.sample_data('Morton2013')

In [3]: data = fr.merge_free_recall(
   ...:     df, study_keys=['category'], list_keys=['list_type']
   ...: )

In [4]: data.head()
   subject  list      item  input  ...  repeat  intrusion  list_type  category
0        1     1     TOWEL    1.0  ...       0      False       pure       obj
1        1     1     LADLE    2.0  ...       0      False       pure       obj
2        1     1   THERMOS    3.0  ...       0      False       pure       obj
3        1     1      LEGO    4.0  ...       0      False       pure       obj
4        1     1  BACKPACK    5.0  ...       0      False       pure       obj

[5 rows x 11 columns]

The merge_free_recall() function only includes columns from the raw data if they are one of the standard columns or if they’ve explictly been included using study_keys, recall_keys, or list_keys. list_keys apply to all events in a list, while study_keys and recall_keys are relevant only for study and recall events, respectively.

We’ve included a list key here, to indicate that the list_type field should be included for all study and recall events in each list, even intrusions. The category field will be included for all study events and all valid recalls. Intrusions will have an undefined category.

Analysis by condition

Now we can run any analysis separately for the different conditions. We’ll use the serial position curve analysis as an example.

In [5]: spc = data.groupby('list_type').apply(fr.spc)

In [6]: spc.head()
list_type subject input          
mixed     1       1.0    0.500000
                  2.0    0.466667
                  3.0    0.600000
                  4.0    0.300000
                  5.0    0.333333

The spc DataFrame has separate groups with the results for each list_type.


When using groupby with order-based analyses like lag_crp(), make sure all recalls in all recall sequences for a given list have the same label. Otherwise, you will be breaking up recall sequences, which could result in an invalid analysis.

Plotting by condition

We can then plot a separate curve for each condition. All plotting functions take optional hue, col, col_wrap, and row inputs that can be used to divide up data when plotting. See the Seaborn documentation for details. Most inputs to seaborn.relplot() are supported.

For example, we can plot two curves for the different list types:

In [7]: g = fr.plot_spc(spc, hue='list_type').add_legend()

We can also plot the curves in different axes using the col option:

In [8]: g = fr.plot_spc(spc, col='list_type')

We can also plot all combinations of two conditions:

In [9]: spc_split = data.groupby(['list_type', 'category']).apply(fr.spc)

In [10]: g = fr.plot_spc(spc_split, col='list_type', row='category')

Plotting by subject

All analyses can be plotted separately by subject. A nice way to do this is using the col and col_wrap optional inputs, to make a grid of plots with 6 columns per row:

In [11]: g = fr.plot_spc(
   ....:     spc, hue='list_type', col='subject', col_wrap=6, height=2
   ....: ).add_legend()