how to calculate the evolution of a protein

scruffy

Diamond Member
Joined
Mar 9, 2022
Messages
25,908
Reaction score
22,374
Points
2,288
This issue apparently mystifies evolutionists and creationists alike. So I will explain it. Using math.

I will illustrate by example.

Let us say, we have an alphabet of 24 letters (amino acids), and let us say we are trying to build a word (protein) with the proper spelling (sequence) and the right meaning.

For this example we will use the word "abracadabra".

This problem, formally in mathematics, is a well known problem. It is called the "word problem". Here is the Wiki about it:


We can not simply multiply probabilities, what we have to is calculate all the ways the word can be put together, from individual letters AND from subsequences. (Subsequences having their own probabilities). For example, if we have pre-existing subsequences "abra", "ca", and "dab", what is the probability that we can complete the word by splitting off the "ra" from the first sequence and combining it with the last?

This is fundamentally an algebraic problem, and we can use the methods of group theory to address it.


If we're using a computer, we can use a "nested stack automaton" to do the work.


But wait, not so fast. Group theory tells us that some combinations are actually unsolvable (can not be calculated). At some point we run up against the Boone-Rogers theorem, which states that there is no uniform (partial) algorithm that solves the word problem in all finitely presented groups.


Which groups are solvable, and which are not?

The answer has to do with isomorphic copies, which are strings that resemble themselves. Palindromes, are an example. We can reference the work of Axel Thue, who was the first mathematician to study the problem in detail (around 1910). Thue studied "square free words", which do not have adjacent repeated factors. (Example: "dining" is not square free, "in" is an adjacent repeated factor).

In this case though, we have a set of unique forward looking sequences that are not adjacent. So the first question we ask, is how many ways can we factor the word. Which is simple combinatorics.

What our automaton does, is assign symbols to the subsequences, and operate on them using the (nested) stack. If you're familiar with the Forth computer language this process becomes intuitive. (You can also use lambda calculus, it's a little harder that way but it works).

Frank Ramsey also studied this problem, around 1930, by mapping the sequences to graphs. He came up with the concept of "unavoidable patterns", which play into loops in the computational process. Another important issue is "necklaces", which are circular sequences. Baudot studied these when he invented the Baudot code.

Our automaton builds a graph, made of vertices and edges. We label the edges with a letter in our alphabet. We build a "path" through the graph, from the initial vertex to the final vertex, traversing our letters. The path is the word. Some paths are impossible, which is why some words can not be calculated.

Fortunately in this case, we have a METRIC so we can measure the distance of our path. It's called a word metric. We use the word metric on the discrete group whose members are represented by the symbols in our alphabet.

To understand how this works, consider the word -3, and let's say it represents the tubulin alpha subunit. The most efficient way to represent the word (globular folding sequence) is -1--1-1. But there are other ways to get to the same word (equivalent globular protein).

This is how we get a METRIC for protein evolution, how we measure it. Once we have a metric we can get a norm (in group algebra this means the shortest length that gets the job done, that is to say gives us a homologous word).

This method is NECESSARY when studying biology, because there is more than one protein that can get the job done. We are interested in ALL the ways of getting the job done, "all possible paths" - and evolution almost always takes more than one path. Of the 100,000 proteins in a human, LESS THAN 100 are highly conserved, and these are the cases where it is likely that there is "only one" folding sequence that will get the job done. Everything else, has latitude - that is to say, will accept some limited mutations and continue to function. In English the words "yes", "yeah", "uh-huh", and "damn straight" all indicate the affirmative and all get the job done

This is the science around protein evolution.

The pHarmas have been synthesizing novel proteins for at least 40 years, and it is only in the last 5 years that they've begun using these methods. They wish to design molecules with the same shape that perform the needed function, but have different sequences that are easier to synthesize. This is how they do it.

This is a formal mathematical system that provides a metric and a norm, and IMPLIES a concept of angle via the graph embedding. This way scientists can deal with equations that predict in advance whether a protein will work in context.

The attractor for proteins is the FUNCTION, not the sequence. Some function is needed by the cell, and the purpose of evolution is to "find" such a function. Even if we know all the probabilities for point mutations and protein chain extensions, we still don't know if the result will perform the function. These methods give us the answer

More information on algebraic groups can be found here:

 
This issue apparently mystifies evolutionists and creationists alike. So I will explain it. Using math.

I will illustrate by example.

Let us say, we have an alphabet of 24 letters (amino acids), and let us say we are trying to build a word (protein) with the proper spelling (sequence) and the right meaning.

For this example we will use the word "abracadabra".

This problem, formally in mathematics, is a well known problem. It is called the "word problem". Here is the Wiki about it:


We can not simply multiply probabilities, what we have to is calculate all the ways the word can be put together, from individual letters AND from subsequences. (Subsequences having their own probabilities). For example, if we have pre-existing subsequences "abra", "ca", and "dab", what is the probability that we can complete the word by splitting off the "ra" from the first sequence and combining it with the last?

This is fundamentally an algebraic problem, and we can use the methods of group theory to address it.


If we're using a computer, we can use a "nested stack automaton" to do the work.


But wait, not so fast. Group theory tells us that some combinations are actually unsolvable (can not be calculated). At some point we run up against the Boone-Rogers theorem, which states that there is no uniform (partial) algorithm that solves the word problem in all finitely presented groups.


Which groups are solvable, and which are not?

The answer has to do with isomorphic copies, which are strings that resemble themselves. Palindromes, are an example. We can reference the work of Axel Thue, who was the first mathematician to study the problem in detail (around 1910). Thue studied "square free words", which do not have adjacent repeated factors. (Example: "dining" is not square free, "in" is an adjacent repeated factor).

In this case though, we have a set of unique forward looking sequences that are not adjacent. So the first question we ask, is how many ways can we factor the word. Which is simple combinatorics.

What our automaton does, is assign symbols to the subsequences, and operate on them using the (nested) stack. If you're familiar with the Forth computer language this process becomes intuitive. (You can also use lambda calculus, it's a little harder that way but it works).

Frank Ramsey also studied this problem, around 1930, by mapping the sequences to graphs. He came up with the concept of "unavoidable patterns", which play into loops in the computational process. Another important issue is "necklaces", which are circular sequences. Baudot studied these when he invented the Baudot code.

Our automaton builds a graph, made of vertices and edges. We label the edges with a letter in our alphabet. We build a "path" through the graph, from the initial vertex to the final vertex, traversing our letters. The path is the word. Some paths are impossible, which is why some words can not be calculated.

Fortunately in this case, we have a METRIC so we can measure the distance of our path. It's called a word metric. We use the word metric on the discrete group whose members are represented by the symbols in our alphabet.

To understand how this works, consider the word -3, and let's say it represents the tubulin alpha subunit. The most efficient way to represent the word (globular folding sequence) is -1--1-1. But there are other ways to get to the same word (equivalent globular protein).

This is how we get a METRIC for protein evolution, how we measure it. Once we have a metric we can get a norm (in group algebra this means the shortest length that gets the job done, that is to say gives us a homologous word).

This method is NECESSARY when studying biology, because there is more than one protein that can get the job done. We are interested in ALL the ways of getting the job done, "all possible paths" - and evolution almost always takes more than one path. Of the 100,000 proteins in a human, LESS THAN 100 are highly conserved, and these are the cases where it is likely that there is "only one" folding sequence that will get the job done. Everything else, has latitude - that is to say, will accept some limited mutations and continue to function. In English the words "yes", "yeah", "uh-huh", and "damn straight" all indicate the affirmative and all get the job done

This is the science around protein evolution.

The pHarmas have been synthesizing novel proteins for at least 40 years, and it is only in the last 5 years that they've begun using these methods. They wish to design molecules with the same shape that perform the needed function, but have different sequences that are easier to synthesize. This is how they do it.

This is a formal mathematical system that provides a metric and a norm, and IMPLIES a concept of angle via the graph embedding. This way scientists can deal with equations that predict in advance whether a protein will work in context.

The attractor for proteins is the FUNCTION, not the sequence. Some function is needed by the cell, and the purpose of evolution is to "find" such a function. Even if we know all the probabilities for point mutations and protein chain extensions, we still don't know if the result will perform the function. These methods give us the answer

More information on algebraic groups can be found here:

There is no clear problem statement here, only vague ideas like "trying to build" and "proper spelling sequence" and "right meaning".

The "word problem" is a problem of determining whether two data structures are "equivalent" - that's what the problem is, "are these equivalent" - a question.

You've posted a verbose and meandering "lecture" here, with no clear explanation of what this has to do with protein synthesis, an activity that has nothing to do with the word problem.

You talk of nested stack automatons, but these abstractions (stacks) do not exist in the natural world, they are human inventions (created by intelligent minds) that have no natural analog just as there's no analog for tree organized information (DNA is a chain) or any number of abstractions from finite state machine to push down automatons.

So what on earth is your post about other an attempt to feign erudition and "solve" a "problem" that isn't even clearly stated?

So start again, we have an alphabet of 26 symbols, well what of it? how many total symbols do we have 10? 1,000,000? what are we meant to do with a set of symbols randomly taken from some large set of symbols? lets imagine a huge set of Scrabble tiles, each of which can be any of the 26 letters, what exactly is the problem we are striving to solve here?

I've designed and built programming languages, so I'm no novice to the overall problem domain and symbols, patterns, rewriting rules and so on.

Most recently I designed a grammar using Antlr that can parse stuff like this without problems:

if then = if then
if = else
else goto = then

That is the language has NO reserved words, any word - even a keyword - can be used as a variable name, and one can write code in any number of human dictionaries like English, German and so on, so please don't try to fool me with your mumbo jumbo.

1724770220720.png


Each of those (and a language-agonistic tool can transform these from one language to another automatically) produces a structurally identical parse tree, there is no other programming language that supports that ability. There is nothing in nature that is equivalent to a digital computer - AKA - symbol manipulator.
 
Last edited:
There is no clear problem statement here, only vague ideas like "trying to build" and "proper spelling sequence" and "right meaning".

The "word problem" is a problem of determining whether two data structures are "equivalent" - that's what the problem is, "are these equivalent" - a question.

You've posted a verbose and meandering "lecture" here, with no clear explanation of what this has to do with protein synthesis, an activity that has nothing to do with the word problem.

You talk of nested stack automatons, but these abstractions (stacks) do not exist in the natural world, they are human inventions (created by intelligent minds) that have no natural analog just as there's no analog for tree organized information (DNA is a chain) or any number of abstractions from finite state machine to push down automatons.

So what on earth is your post about other an attempt to feign erudition and "solve" a "problem" that isn't even clearly stated?

So start again, we have an alphabet of 26 symbols, well what of it? how many total symbols do we have 10? 1,000,000? what are we meant to do with a set of symbols randomly taken from some large set of symbols? lets imagine a huge set of Scrabble tiles, each of which can be any of the 26 letters, what exactly is the problem we are striving to solve here?

I've designed and built programming languages, so I'm no novice to the overall problem domain and symbols, patterns, rewriting rules and so on.

Most recently I designed a grammar using Antlr that can parse stuff like this without problems:

if then = if then
if = else
else goto = then

That is the language has NO reserved words, any word - even a keyword - can be used as a variable name, and one can write code in any number of human dictionaries like English, German and so on, so please don't try to fool me with your mumbo jumbo.

View attachment 1002262

Each of those (and a language-agonistic tool can transform these from one language to another automatically) produces a structurally identical parse tree, there is no other programming language that supports that ability. There is nothing in nature that is equivalent to a digital computer - AKA - symbol manipulator.
$50 an hour, for you.
 
$50 an hour, for you.
If you can't clearly state a problem then you'll likely never know if you actually solved it - that's my experience, but if it makes you happy...
 
If you can't clearly state a problem then you'll likely never know if you actually solved it - that's my experience, but if it makes you happy...
You don't understand the problem?

Yet you continually argue against its feasibility.

You're proving me right.

You're not here for science.
 
You don't understand the problem?
The "word problem" yes, your problem, no idea! The post is appallingly written. A problem specification must be useable by both the developer and the tester, they must be able to independently build a system and test it, your problem statements isn't fit for purpose.
 
The "word problem" yes, your problem, no idea! The post is appallingly written. A problem specification must be useable by both the developer and the tester, they must be able to independently build a system and test it, your problem statements isn't fit for purpose.
Hilarious. And you call yourself an experienced programmer? LMAO. :p
 
Darwin said it best, ""If it could be demonstrated that any complex organ existed which could not possibly have been formed by numerous, successive, slight modifications, my theory would absolutely break down."

26^ = 4*10^26. It's a really, really big number

If you had a computer operating system made of only 26 lines and you thought to assemble them at random, well you're looking at Windows ME
 
Darwin said it best, ""If it could be demonstrated that any complex organ existed which could not possibly have been formed by numerous, successive, slight modifications, my theory would absolutely break down."

26^ = 4*10^26. It's a really, really big number

If you had a computer operating system made of only 26 lines and you thought to assemble them at random, well you're looking at Windows ME
The probability of a peptide bond forming at the air-water interface with a few copper ions in the water is 1 in 2.
 
The probability of a peptide bond forming at the air-water interface with a few copper ions in the water is 1 in 2.
Source please? you often misinterpret stuff so I'd like to check.

A "peptide bond" by the way is:

1724957752643.png


So which amino acids are you referring to, what is their concentration and so on, just share a source. These are legitimate science questions so please don't throw another tantrum just because you don't like answering questions or don't know the answers.
 
Last edited:
The probability of a peptide bond forming at the air-water interface with a few copper ions in the water is 1 in 2.

So are the odds of a flipped 2-sided coin landing on head!

OMG!

You solved it!

 
Source please? you often misinterpret stuff so I'd like to check.

A "peptide bond" by the way is:

View attachment 1003391

So which amino acids are you referring to, what is their concentration and so on, just share a source. These are legitimate science questions so please don't throw another tantrum just because you don't like answering questions or don't know the answers.
Okay. When I get time I'll document my sources for you.
 
So are the odds of a flipped 2-sided coin landing on head!

OMG!

You solved it!
lol :p

Yeah, that sounds about right. You get bonding about half the time.

In plain water it's about 1 in 10,000

At the air-water interface you gain a factor of at least 10, so 1 in a thousand.

Throw in a few copper ions though, and now you get lots and lots of short chain amino acid, which then continue bonding into longer chains.

"Proteins" long preceded life. They had to first be ingested by micelles, along with maybe a few nucleotides. Proteins regulate DNA, not the other way around.

Here, let me show you something:


Proteins unravel the knots at the right times. There was undoubtedly a rich library of proteins available before such things could happen
 
lol :p

Yeah, that sounds about right. You get bonding about half the time.

In plain water it's about 1 in 10,000

At the air-water interface you gain a factor of at least 10, so 1 in a thousand.

Throw in a few copper ions though, and now you get lots and lots of short chain amino acid, which then continue bonding into longer chains.

"Proteins" long preceded life. They had to first be ingested by micelles, along with maybe a few nucleotides. Proteins regulate DNA, not the other way around.

Here, let me show you something:


Proteins unravel the knots at the right times. There was undoubtedly a rich library of proteins available before such things could happen

So you're saying assembling a cell is as easy as the atmosphere contacting the ocean?

That's amazing!
 
lol :p

Yeah, that sounds about right. You get bonding about half the time.

In plain water it's about 1 in 10,000

At the air-water interface you gain a factor of at least 10, so 1 in a thousand.

Throw in a few copper ions though, and now you get lots and lots of short chain amino acid, which then continue bonding into longer chains.

"Proteins" long preceded life. They had to first be ingested by micelles, along with maybe a few nucleotides. Proteins regulate DNA, not the other way around.

Here, let me show you something:


Proteins unravel the knots at the right times. There was undoubtedly a rich library of proteins available before such things could happen

So proteins and DNA evolved separately from random molecules, but have a perfectly symbiotic relationship?

That's amazing!
 
lol :p

Yeah, that sounds about right. You get bonding about half the time.

In plain water it's about 1 in 10,000

At the air-water interface you gain a factor of at least 10, so 1 in a thousand.

Throw in a few copper ions though, and now you get lots and lots of short chain amino acid, which then continue bonding into longer chains.

"Proteins" long preceded life. They had to first be ingested by micelles, along with maybe a few nucleotides. Proteins regulate DNA, not the other way around.

Here, let me show you something:


Proteins unravel the knots at the right times. There was undoubtedly a rich library of proteins available before such things could happen

And there are 50,000 of these in a DNA?

That's amazing!
 
lol :p

Yeah, that sounds about right. You get bonding about half the time.

In plain water it's about 1 in 10,000

At the air-water interface you gain a factor of at least 10, so 1 in a thousand.

Throw in a few copper ions though, and now you get lots and lots of short chain amino acid, which then continue bonding into longer chains.

"Proteins" long preceded life. They had to first be ingested by micelles, along with maybe a few nucleotides. Proteins regulate DNA, not the other way around.

Here, let me show you something:


Proteins unravel the knots at the right times. There was undoubtedly a rich library of proteins available before such things could happen

Was the first DNA strand fully formed and just waiting for its perfect protein Tinder match?
 

New Topics

Back
Top Bottom