AE BB DB Explorer


Action:
Author:
Search Terms (separate with commas, no spaces):


form_srcid: RumraketR

form_srcid: RumraketR

form_cmd: view_author

Your IP address is 54.243.23.129

View Author detected.

view author posts with search matches:

Retrieve source record and display it.

form_author:

form_srcid: RumraketR

q: SELECT AUTHOR, MEMBER_NAME, IP_ADDR, POST_DATE, TOPIC_ID, t1.FORUM_ID, POST, POST_ID, FORUM_VIEW_THREADS from ib_forum_posts AS t1 LEFT JOIN (ib_member_profiles AS t2, ib_forum_info AS t3) ON (t1.forum_id = t3.forum_id AND t1.author = t2.member_id) WHERE MEMBER_NAME like 'RumraketR%' and forum_view_threads LIKE '*' ORDER BY POST_DATE ASC

DB_err:

DB_result: Resource id #4

Date: 2012/11/20 14:32:41, Link
Author: RumraketR
Quote (Jerry Don Bauer @ Nov. 19 2012,16:37)
Comparing the genome to computer data storage. In order to represent a DNA sequence on a computer, we need to be able to represent all 4 base pair possibilities in a binary format (0 and 1). These 0 and 1 bits are usually grouped together to form a larger unit, with the smallest being a “byte” that represents 8 bits. We can denote each base pair using a minimum of 2 bits, which yields 4 different bit combinations (00, 01, 10, and 11).  Each 2-bit combination would represent one DNA base pair.  A single byte (or 8 bits) can represent 4 DNA base pairs.  In order to represent the entire diploid human genome in terms of bytes, we can perform the following calculations:

6×10^9 base pairs/diploid genome x 1 byte/4 base pairs = 1.5×10^9 bytes or 1.5 Gigabytes, about 2 CDs worth of space!


http://bitesizebio.com/article....-genome

is 1.5 Gigabytes more than 500 bits? Then why would we want to go any further than this as you already have the answer before you start.

ANY organism will be over 500 bits.[/quote]
Hello everyone, I've been a lurker here for a few years now and I just have to respond because this could be historical stuff.

I want to make sure I understand you correctly here, Jerry Don Bauer, because according to what I have quoted, you seem to be saying that the quantity of information in a string of symbols is equal to the length of the string divided by the number of possible symbols at each locus? As in the information content is measured in bits and is thus proportional to the length of the sequence?

You refer to the example of a 6 billion base-pair diploid genome, divided by the number of possibilities pr site (4):

6×10^9 base pairs/diploid genome x 1 byte/4 base pairs = 1.5×10^9 bytes or 1.5 Gigabytes, about 2 CDs worth of space!

In other words, the information content of a sequence of DNA, for example 12 base-pairs in length, AUGAATAUGTTA, is equal to 12 base pairs x 1 byte/4 base pairs = 3 bytes.

Am I correct in my understanding here?

Date: 2012/11/20 16:18:41, Link
Author: RumraketR
Quote (Jerry Don Bauer @ Nov. 20 2012,15:34)
Quote (Jerry Don Bauer @ Nov. 20 2012,15:51)
We are discussing Complex Specified Information and what makes certain information complex, or not and/or specified or not.

This has little to do with the length of anything or the amount of loci it harbors.

Quote
You refer to the example of a 6 billion base-pair diploid genome, divided by the number of possibilities pr site (4):

6×10^9 base pairs/diploid genome x 1 byte/4 base pairs = 1.5×10^9 bytes or 1.5 Gigabytes, about 2 CDs worth of space!

In other words, the information content of a sequence of DNA, for example 12 base-pairs in length, AUGAATAUGTTA, is equal to 12 base pairs x 1 byte/4 base pairs = 3 bytes.

Am I correct in my understanding here?

You are referring to a link I referrenced. The purpose of that link was to show that even a genome contains much more information than the 500 bits upper probability boundary. Therefore, an entire organism most certainly would be over 500 bits and therefore CSI.....


That was all I was pointing out.....I certainly did not want to get into genomic entropy and the like at this point.

Quote (Jerry Don Bauer @ Nov. 20 2012,15:51)
We are discussing Complex Specified Information and what makes certain information complex, or not and/or specified or not.

This has little to do with the length of anything or the amount of loci it harbors.

Quote
You refer to the example of a 6 billion base-pair diploid genome, divided by the number of possibilities pr site (4):

6×10^9 base pairs/diploid genome x 1 byte/4 base pairs = 1.5×10^9 bytes or 1.5 Gigabytes, about 2 CDs worth of space!

In other words, the information content of a sequence of DNA, for example 12 base-pairs in length, AUGAATAUGTTA, is equal to 12 base pairs x 1 byte/4 base pairs = 3 bytes.

Am I correct in my understanding here?

You are referring to a link I referrenced. The purpose of that link was to show that even a genome contains much more information than the 500 bits upper probability boundary. Therefore, an entire organism most certainly would be over 500 bits and therefore CSI.....

Okay, fair enough, I think I understand. But just to be sure, you agree the quantity of information in the genome there is 1.5 gigabytes? Not CSI, not Entropy, just 1.5 gigabytes of information, and 1.5 gigabytes is more than 500 bits(and 500 bits would be the bound above with the quantity of information would qualify as being CSI). Right?

If my understanding is not correct, could you clarify:
A): How to calculate the quantity of information in an arbitrary string of DNA, for example?

You can use any stretch of DNA you want, like a real world promoter sequence(or mRNA transcript or whatever you like), or just use a small random string for the purpose, like the one I supplied. Anything is fine with me, I just want to make sure that we agree on how to calculate the quantity of information in a string of symbols, like DNA, in bytes.

I understand that the quantity itself is not what makes it Complex or Specified. I just want to make sure we agree on how to calculate the quantity.

Quote (Jerry Don Bauer @ Nov. 20 2012,15:51)
That was all I was pointing out.....I certainly did not want to get into genomic entropy and the like at this point.

That's absolutely fine with me, we don't have to delve into entropy or anything. I just want to reach an agreement on the basics, like how to calculate information quantity in stretches of DNA.

That's why I brought up the example you quoted earlier, because you seemed to be using a method that corresponded to length of string divided by number of symbols and reporting the result in bytes.

If this is not how you would calculate information content in a string of symbols, how else? Give an example and I would be most grateful.

Thanks again for your time :)

 

 

 

=====