Monday, May 16, 2005

The joys (and perils) of having data

I'm working on a talk for the conference next week and right now I am in the midst of the horrible process of importing Excel graphs into Powerpoint and then making them look pretty again. (I don't understand how such horrible things can happen going between 2 office products).

As I think about which graphs to include and what I'll say about them, it's causing me to re-evaluate some of my data. For example, I spent 30 minutes in crisis mode over 4 points (of 26) on one graph. If I throw out those 4 points (and I have some justification for doing so), my regression r-squared improves and my slope changes fairly significantly. But what I really want the regression for is another level of interpretation, and the change in slope makes very little difference for that. So the points will stay for now, but I am sure they will be re-evaluated again when I put that figure into the paper I am eventually going to write. And then more time will be spent weighing my options.

I am reminded of my mom saying that in field work every decision you make has ethical consequences. Well, it doesn't stop when you get back to the lab (or spreadsheet) either. In talking with other people about this sort of decision making, it seems that the common practice is to document, document, document. But that documentation will end up buried in my thesis somewhere, not on display in the published article. Is that okay? What sorts of decisions are other researchers making about what data to include and not?

How much do all the little, seemingly insignificant decisions we make about where to sample, how long to measure things, which points to throw out, what curve to fit, etc. add up to effect our end interpretations and our understanding of the way the system or process works? These are questions that can't be answered by simple error propagation but they are usually missed in the rare ethics seminars that we take. So I am left to ponder the cosmic significance of my 4 points.

1 comment:

Writer Chica said...

I had trouble with that sort of thing too. All of the data stayed in unless it had a really good reason to be dropped. This usually meant rerunning something or me figuring something out and getting the okay from Deb. I suppose that is the difference between a master's and a Ph.D. Soon you will be the professor and need to decide. Sometimes it is hard to tell between what's trash, what looks like trash but isn't, and what is irrelevant whether it is trash or not.