When analyzing a dataset, it’s often important to compare different experimental conditions. Psifr is built on the Pandas DataFrame, which has powerful ways of splitting data and applying operations to it. This makes it possible to analyze and plot different conditions using very little code.
First, load some sample data and create a merged DataFrame:
In [1]: from psifr import fr In [2]: df = fr.sample_data('Morton2013') In [3]: data = fr.merge_free_recall( ...: df, study_keys=['category'], list_keys=['list_type'] ...: ) ...: In [4]: data.head() Out[4]: subject list item input ... repeat intrusion list_type category 0 1 1 TOWEL 1.0 ... 0 False pure obj 1 1 1 LADLE 2.0 ... 0 False pure obj 2 1 1 THERMOS 3.0 ... 0 False pure obj 3 1 1 LEGO 4.0 ... 0 False pure obj 4 1 1 BACKPACK 5.0 ... 0 False pure obj [5 rows x 11 columns]
The merge_free_recall() function only includes columns from the raw data if they are one of the standard columns or if they’ve explictly been included using study_keys, recall_keys, or list_keys. list_keys apply to all events in a list, while study_keys and recall_keys are relevant only for study and recall events, respectively.
merge_free_recall()
study_keys
recall_keys
list_keys
We’ve included a list key here, to indicate that the list_type field should be included for all study and recall events in each list, even intrusions. The category field will be included for all study events and all valid recalls. Intrusions will have an undefined category.
list_type
category
Now we can run any analysis separately for the different conditions. We’ll use the serial position curve analysis as an example.
In [5]: spc = data.groupby('list_type').apply(fr.spc) In [6]: spc.head() Out[6]: recall list_type subject input mixed 1 1.0 0.500000 2.0 0.466667 3.0 0.600000 4.0 0.300000 5.0 0.333333
The spc DataFrame has separate groups with the results for each list_type.
spc
Warning
When using groupby with order-based analyses like lag_crp(), make sure all recalls in all recall sequences for a given list have the same label. Otherwise, you will be breaking up recall sequences, which could result in an invalid analysis.
groupby
lag_crp()
We can then plot a separate curve for each condition. All plotting functions take optional hue, col, col_wrap, and row inputs that can be used to divide up data when plotting. See the Seaborn documentation for details. Most inputs to seaborn.relplot() are supported.
hue
col
col_wrap
row
seaborn.relplot()
For example, we can plot two curves for the different list types:
In [7]: g = fr.plot_spc(spc, hue='list_type').add_legend()
We can also plot the curves in different axes using the col option:
In [8]: g = fr.plot_spc(spc, col='list_type')
We can also plot all combinations of two conditions:
In [9]: spc_split = data.groupby(['list_type', 'category']).apply(fr.spc) In [10]: g = fr.plot_spc(spc_split, col='list_type', row='category')
All analyses can be plotted separately by subject. A nice way to do this is using the col and col_wrap optional inputs, to make a grid of plots with 6 columns per row:
In [11]: g = fr.plot_spc( ....: spc, hue='list_type', col='subject', col_wrap=6, height=2 ....: ).add_legend() ....: