Comparing conditions#
When analyzing a dataset, it’s often important to compare different
experimental conditions. Psifr is built on the Pandas DataFrame
, which
has powerful ways of splitting data and applying operations to it.
This makes it possible to analyze and plot different conditions using
very little code.
Working with custom columns#
First, load some sample data and create a merged DataFrame:
In [1]: from psifr import fr
In [2]: df = fr.sample_data('Morton2013')
In [3]: data = fr.merge_free_recall(
...: df, study_keys=['category'], list_keys=['list_type']
...: )
...:
In [4]: data.head()
Out[4]:
subject list item input ... list_type category prior_list prior_input
0 1 1 TOWEL 1.0 ... pure obj NaN NaN
1 1 1 LADLE 2.0 ... pure obj NaN NaN
2 1 1 THERMOS 3.0 ... pure obj NaN NaN
3 1 1 LEGO 4.0 ... pure obj NaN NaN
4 1 1 BACKPACK 5.0 ... pure obj NaN NaN
[5 rows x 13 columns]
The merge_free_recall()
function only includes columns from the
raw data if they are one of the standard columns or if they’ve explictly been
included using study_keys
, recall_keys
, or list_keys
.
list_keys
apply to all events in a list, while study_keys
and
recall_keys
are relevant only for study and recall events, respectively.
We’ve included a list key here, to indicate that the list_type
field should be included for all study and recall events in each list, even
intrusions. The category
field will be included for all study events
and all valid recalls. Intrusions will have an undefined category.
Analysis by condition#
Now we can run any analysis separately for the different conditions. We’ll use the serial position curve analysis as an example.
In [5]: spc = data.groupby('list_type').apply(fr.spc)
In [6]: spc.head()
Out[6]:
subject input recall
list_type
mixed 0 1 1.0 0.500000
1 1 2.0 0.466667
2 1 3.0 0.600000
3 1 4.0 0.300000
4 1 5.0 0.333333
The spc
DataFrame has separate groups with the results for each
list_type
.
Warning
When using groupby
with order-based analyses like
lag_crp()
, make sure all recalls in all recall
sequences for a given list have the same label. Otherwise, you will
be breaking up recall sequences, which could result in an invalid
analysis.
Plotting by condition#
We can then plot a separate curve for each condition. All plotting functions
take optional hue
, col
, col_wrap
, and row
inputs that can be used to divide up data when plotting.
Most inputs to seaborn.relplot()
are supported.
For example, we can plot two curves for the different list types:
In [7]: g = fr.plot_spc(spc, hue='list_type').add_legend()
We can also plot the curves in different axes using the col
option:
In [8]: g = fr.plot_spc(spc, col='list_type')
We can also plot all combinations of two conditions:
In [9]: spc_split = data.groupby(['list_type', 'category']).apply(fr.spc)
In [10]: g = fr.plot_spc(spc_split, col='list_type', row='category')
Plotting by subject#
All analyses can be plotted separately by subject. A nice way to do this is
using the col
and col_wrap
optional inputs, to make a grid
of plots with 6 columns per row:
In [11]: g = fr.plot_spc(
....: spc, hue='list_type', col='subject', col_wrap=6, height=2
....: ).add_legend()
....: