So this semester is pretty much opposite of last semester. Instead of working solo or in a pair at a lab bench, we’re working in groups on computers, which in many ways is a lot more fun and interesting. The first two days were entirely trying to download computer programs, which was mind-numbingly boring. Then we got the sequenced genome for Manatee, allowing us to do fun things with interesting programs. The main program we’re working with is DNA Master, which checked the genome for start and stop codons, and gave us a list of 92 putative genes. However, the program is not perfect, so we have to go through and check all of them.
For each gene, first we check whether it is long enough to really be a gene. This hasn’t been a problem so far, aside from a brief debate over gene 1. Then we check whether there is a good Shine Delgarno score, which also hasn’t been much of a problem. Sometimes if there are two possible starts, we chose the one with the better Shine Delgarno score. We then run the amino acid sequence through BLAST, which tells us about similar sequences found in other bacteriophages. In this search we typically look for a good alignment, identity, and similarity match. This resulted in a long debate over what the difference between identity and similarity was, which was helpful in clarifying what exactly we wanted in a good BLAST hit. We also look for coding potential, meaning well the protein could form and be folded (with hydrophilic amino acids on the outside of the protein and hydrophobic amino acids on the inside of the protein).
The last thing we check for each gene is HHPred. This takes the amino acid sequence and runs it against a more stringent set of known proteins. It tells you how likely it is that your protein is a protein in another organism, and gives you an e-value to tell you how good that probability is. Many of the genes have no good HHPred hits, which isn’t good, but isn’t necessarily bad. However, a couple results prompted me to run the sequence through COILS/PCOILS. I had never heard of this before, but I figured I should follow their suggestions. Lo and behold, two of the three genes I ran through the program had a probability of one on the chart that was produced.
I had no idea how to read them, but I thought they might be important, so I saved the pictures of the charts and took them in to class today. Apparently, when proteins form alpha helices, the first, fourth, fifth, and seventh amino acids in the helix line up exactly. When there are two adjacent alpha helices and their first, fourth, fifth, and seventh amino acids are hydrophobic, they come together. COILS/PCOILS tell you how likely it is that your protein forms this alpha helix complex. The closer the probability is to one, the more likely it forms. Taking this information, I discovered that protein 16 has a 99% probability of this occurring in its last 40 amino acids, and that protein 24 has above an 80% probability of this happening between the 20th and 40th amino acids, and between the 320th and 360th amino acids.
While this might be overall irrelevant to our genome annotation, I think it’s pretty cool to be able to know how the protein looks. I personally prefer alpha helices over beta sheets, and knowing that they can interconnect makes me envision two twin spiral staircases, which would look pretty cool in molecule form. I’m thinking maybe for my “special project” in April I could look at similar genes and see if they also have these awesome spirals. I’m not sure what else I would do with it, but maybe I’ll come up with something as we move further into the semester.