An important aspect of scientific research is that findings are reproducible, falsifiable and transparent. Especially in an empirical approach, it is of the utmost importance to make datasets available. It should become a natural reflex to feel an urge for seeing the data behind the publication. No matter how well the publication describes the variables, it is always interesting and insightful to learn how certain observations are annotated. From your own experience, you probably already know how difficult it usually is to decide which value to assign from the variables you are investigating. These insecureties are also present in other corpus linguists. Perhaps, that is why many (corpus) linguists do not make their datasets freely available. But usually, they bring two kinds of arguments to the table.
Although it is a good idea to build your datasets in spreadsheet software, it is an even better idea to save your dataset (after you are ready with the annotation, of course) into the csv format.
Datasets are among the most important objects in a scientific study. It is best to stick to a widely used format for your dataset so that other people are able to understand what you have done. In order to find a good format for corpuslinguistic datasets, the nature of corpuslinguistic data needs to be investigated.