The space accessed by a folding macromolecule is vast, and how to best project computer simulations of protein folding trajectories into an interpretable sequence of discrete states is an open research problem. There are numerous alternative ways of associating individual configurations into collective states, and in deciding on the number of such clustered states there is a trade-off between human interpretability (smaller number of states) and accuracy of representation (larger number of states). Here we introduce a trajectory likelihood measure for assessing alternative discrete state models of protein folding. We find that widely used rmsd-based clustering methods require large numbers of initial states and a second agglomeration step based on kinetic connectivity to produce models with high predictive power; this is the approach taken in elegant recent work with Markov State Models of protein folding. In contrast, we find that grouping of states based on secondary structure pairings or contact maps, when refined with K-means clustering, yields higher likelihood models with many fewer states. Using the most predictive contact map representation to study the folding transitions of the WW domain in very long molecular dynamics simulations, we identify new states and transitions. The methods should be generally useful for investigating the structural transitions in protein folding simulations for larger proteins.