A fundamental component of language acquisition involves organizing words into grammatical categories. Previous literature has suggested a number of ways in which this categorization task might be accomplished. Here we ask whether the patterning of the words in a corpus of linguistic input (distributional information) is sufficient, along with a small set of learning biases, to extract these underlying structural categories. In a series of experiments, we show that learners can acquire linguistic form-classes, generalizing from instances of the distributional contexts of individual words in the exposure set to the full range of contexts for all the words in the set. Crucially, we explore how several specific distributional variables enable learners to form a category of lexical items and generalize to novel words, yet also allow for exceptions that maintain lexical specificity. We suggest that learners are sensitive to the contexts of individual words, the overlaps among contexts across words, the non-overlap of contexts (or systematic gaps in information), and the size of the exposure set. We also ask how learners determine the category membership of a new word for which there is very sparse contextual information. We find that, when there are strong category cues and robust category learning of other words, adults readily generalize the distributional properties of the learned category to a new word that shares just one context with the other category members. However, as the distributional cues regarding the category become sparser and contain more consistent gaps, learners show more conservatism in generalizing distributional properties to the novel word. Taken together, these results show that learners are highly systematic in their use of the distributional properties of the input corpus, using them in a principled way to determine when to generalize and when to preserve lexical specificity.