psifr.fr.category_clustering#

psifr.fr.category_clustering(df, category_key)#

Category clustering of recall sequences.

Calculates ARC (adjusted ratio of clustering) and LBC (list-based clustering) statistics indexing recall clustering by category.

The papers introducing these measures do not describe how to handle repeats and intrusions. Here, to maintain the assumptions of the measures, they are removed from the recall sequences.

Note that ARC is undefined when only one category is recalled. Lists with undefined statistics will be excluded from calculation of mean subject-level statistics. To calculate for each list separately, group by list before calling the function. For example: df.groupby('list').apply(fr.category_clustering, 'category').

Parameters:
  • df (pandas.DataFrame) – Merged study and recall data. See merge_free_recall. Must have a field indicating the category of each study and recall event.

  • category_key (str) – Column with category labels. Labels may be any hashable (e.g., a str or int).

Returns:

stats – For each subject, includes columns with the mean ARC and LBC statistics.

Return type:

pandas.DataFrame

Examples

>>> from psifr import fr
>>> raw = fr.sample_data('Morton2013')
>>> mixed = raw.query('list_type == "mixed"')
>>> data = fr.merge_free_recall(mixed, list_keys=['category'])
>>> stats = fr.category_clustering(data, 'category')
>>> stats.head()
   subject       lbc       arc
0        1  3.657971  0.614545
1        2  2.953623  0.407839
2        3  3.363768  0.627371
3        4  4.444928  0.688761
4        5  7.530435  0.873755