I’ve spent the last few months working with the folks at Codecademy to create a tutorial targeted at experimentalists. It covers what I have found to be the most commonly-encountered-but-easily-solvable problem in experimental science: data management. Especially with older machines, data is often generated in non-ideal formats. The data might present itself in obscure text files, or maybe you only need every nth datapoint. Regardless, the usual solution that I’ve seen is for a scientist to spend hours manually copying and pasting important information. Not only is this dangerous – translational errors are a real thing – but it’s miserably boring.
This issue came up in the work behind one of my earlier papers (Link to the paper and my blog post). In one set of experiments, we wanted to view the motion of a single dye molecule in various metal-organic framework crystals. Amazingly, this task can be accomplished through an approach called fluorescence correlation spectroscopy, or FCS. This method can monitor the fluorescence intensity of a small volume (think sub-micron) over time, providing information on flow. In a very dilute system, the approach can monitor the volume around single molecules, which allows us to extrapolate data on different time scales – we can observe Brownian motion, adsorption kinetics, and diffusion.
The resultant data of a set of experiments comes out looking like this large text file (I don’t know if I have rights to the raw data, so I’ve mangled the values. I have, however, attempted to keep the underlying trends the same.) We needed to autocorrelate the data and fit to models to ascertain diffusion and adsorption coefficients, but we obviously couldn’t do that in its current form. We moved the data to Excel using the approach described in the Codecademy tutorial, and were then able to fit our models. Victory.
If anyone has any questions or comments about the Codecademy tutorial or the underlying FCS experiments, feel free to ask!

I tried your tutorial on Codecademy ant it was really good! There were a few hitches but I liked it and would like to see more from you.
Some thoughts: You may want to give a brief explanation of what “zip” does.
Also, I would get the correct answer to a couple of the exercises, but because I did not do it the way you decided was right it would not give me a pass. On one of them I had to look ahead to the nest exercise to get the answer you were looking for. I suspect some of this frustration is a function on how the tutorials are created.
Still I LOVED that you used real world examples, and documented the code, and gave syntax for the file read/write even if it would not be used because of security, and, and, and…
Also it brought me to your blog. Which is also pretty cool. I am a datageek learning Python. I don’t have a background in chemistry at all, but I still find your blog fascinating and informative.
Thank you very much!
Glad to hear you liked it!
I wrote a hint for zip() in the lesson, but I’ve still been getting a lot of comments about using it. I’ll change it soon.
The submission correctness issues are on me, as well. I like having people struggle a bit through a lesson (especially toward the end), but more steps -> more ways to solve. It makes the tests a little more complicated, but I could definitely update them.
Additionally, the people behind Codecademy have been working toward making a prerequisite structure for lessons, so I could one day point users to a lesson on zip(). This would also allow me to remove my basic string tutorial and focus more on data parsing.
They’re also talking about adding in file reading, which would really help out with the lesson as a whole.
Thanks again for the kind words! Are there any other lessons you’d be interested in seeing?
Any further lessons I would like to see? Yes! At this point I am still pretty new, so just about anything. One think I can think of is the how to create and use classes, well… the whole object philosophy really as it is realized in Python
Other things I would like to see are lessons on user interface creation, graphics, and pulling in information from web APIs and populating databases with it. None of which I would think Codecademy will let you do.
At this point I don’t know much about even about what it is I don’t know, so pretty much anything not already on the site.
Wow, you’re right. There’s not much on classes in Codecademy. Really odd. Although classes are much less necessary in Python than in other languages (especially Java), they’re still a fundamental part of object-oriented programming. I’ll bring that up with them.
Not knowing what you don’t know is one of the most frustrating things ever. I’ve done some work on big data, so I could at least try to point you in the right direction with some stuff?
User interfaces – there’s a lot of debate here. If you want something in Python, I’d recommend the wxpython library. It’s super easy to use, and you’ll get basic functionality quickly.
If you’re looking for data visualization, check out data-driven documents. It’s awesome, but you’ll need some javascript skill to use it.
For pulling data – if you can already pull the data (which it sounds like you can), then you’re most of the way there. Once it’s in your code (ie. x = whatever), you can look into database APIs. My favorite database – by far – is mongoDB. Their Python wrapper, pymongo, is amazingly straightforward.
Hope some of this helps!
Actually pulling the data is part of what I need to figure out. Still what you gave me is pretty cool. I am going to have to stop commenting for a while as you keep giving me more awesome things to look up and understand. Thank you!
Hey, just thought I’d like to say a big thank you on the course you wrote on Codeacademy! Despite having almost no background whatsoever in science, I found your course to be great example of real-life applications in Python.
Frankly I loved how open a lot of the lessons were as to how you could solve them – really encourages self-teaching. I only started Python a month ago but your course has been one of the most satisfying – shame it’s not on the official course list in Codeacademy!
I may have to poke around your blog a bit since some of the posts look pretty interesting!
Thanks a lot! I’ve always preferred learning through open-ended problems and struggling with things a little bit, so I wanted to make a tutorial with that philosophy.
This blog is a work in progress, but I’m hoping to write tutorials on everything computational in science. Some of them might turn in to more Codecademy tutorials, depending on the amount of interest they get!
Hey Patrick. Are you aware that nobody can pass exercise 3.3 Extracting The Data in your course? Check out forum
http://www.codecademy.com/forums/data-management-for-scientists/2/exercises/2. Please fix it. Thanks
I did what I could. Hope it helps!