If it is easy to verify whether a solution to a problem is correct, is the problem easy to solve? This question is posed as the P vs NP problem and is considered one of the most important unsolved problems in Mathematics. This post will be brief and forgo many technical details surrounding the question for the sake of accessibility.
P and NP are classes of problems, and let us say for now that problems in P are known to be “easy” and problems in NP are thought to be “hard”. Please excuse the vagueness, which will be overcome shortly. It is unknown whether or not P=NP. This would mean that these sets of problems are in fact the same, and problems thought to be difficult are in fact easy.
What is “easy to solve”?
Our consideration is on decision problems, which are those that request a yes or no answer. Problems in computer science are often solved with algorithms, which are sequences of directions. All the algorithms we’ll consider involve an input, and normally the algorithm is answering a question about the input. In order to run an algorithm in practice, we have to know that it will not take too long to execute.
In computer science, time is not measured by how many hours, minutes, seconds the algorithm takes. Time is measured by the maximum number of steps an algorithm could take. Furthermore, this number of directions is given in terms of the size of the input. Now it is true that some problems have algorithms which could take an infinite number of steps, but we’ll only consider problems with algorithms that take a finite number of steps.
Suppose given a list of numbers, a statistician wants to know if there is a number in the list strictly larger than 100. The input is obviously the list given. An algorithm for this problem would be to look at all the numbers and see if such a number exists in the list. For this example alone, we’ll outline the algorithm to illustrate that it is simply a sequence of directions:
- Look at the first number in the list.
- If that number is bigger than 100 return yes.
- Otherwise (if that number is ≤100):
- and if there is a next number in the list, go to step 2 considering the next number.
- and if that was the end of the list, return no.
If our list has n numbers in it, this algorithm executes at most n steps.
We classify an algorithm’s running time into two groups: those that are polynomial in n and those that are not. Examples of running times that are polynomial in n are n², 4, log n, 3n, 5n⁹-6n⁶+4n+2,… if an algorithm can be performed in a number of steps less than some multiple of a power of n, then it is said to run in polynomial time. P is the class of problems which have an algorithm that runs in polynomial time. Notice that the problem of determining whether any number in a list is greater than 100 is in P.
Now of course an algorithm which runs in time n is faster than an algorithm which runs in time n³, but we are less concerned with this difference. When n is large, any function polynomial in n grows slower than any function exponential in n. To say this a bit more formally, for any fixed integer k, nᵏ grows slower than (1+x)ⁿ for fixed x>0 and large enough n. Some examples of running times which are not polynomial are 2ⁿ, n!, nⁿ. The graph below illustrates that these problems are more computationally expensive to run.
An algorithm which runs in time 2ⁿ on an input of size 270 elements could take operations, and an algorithm which runs in time n! on an input of size 85 elements could take 85! operations. For sake of comparison, the number of particles in the observable universe is around , and both and . So the number of operations to complete these algorithms could be more than the number of particles in the observable universe.
What is “easy to verify”?
NP is the class of problems whose solutions can be verified in polynomial time. The abbreviation stands for nondeterministic polynomial time. Obviously if a problem can be solved in polynomial time, then that solution is correct. So verifying the answer is achieved just by finding it. This implies that every problem in P is also in NP, which is written P ⊂ NP. It is not known whether every problem in NP is also in P; if this were the case then the sets of problems are the same, which is written P=NP. We already established that the list problem above is in P, thus it’s in NP.
Suppose a cartographer were given a map of n countries and would like to know if there exists a coloring of the map such that every country is colored and no bordering countries are the same color. The simple algorithm for this problem would be to try every coloring, and check to see if any of them are valid for the cartographer’s condition. This would involve testing a lot of colorings: for every country, the cartographer has 3 choices on what color it could be, which means they’ll have to test 3·3⠤⠤3·3=3ⁿ different colorings. As the number of colorings to test is exponential, this algorithm is not polynomial in n. Thus, given this algorithm, we do not know whether the coloring problem is in P, but we can easily show that it’s in NP.
Now, let’s say a cartographer were given a map of n countries and a coloring scheme where every country was assigned one of 3 colors, and they’d like to verify whether this coloring is such that no bordering countries are the same color. A simple algorithm would be to take every pair of countries which share a border and check if they have the same color. If the algorithm ever finds a pair of bordering countries with the same color, it returns no. Otherwise, it returns yes. This algorithm would take less than n² steps, which would be how many steps were needed if all pairs of countries were checked. As a specific color scheme could be checked in less than n² steps, this coloring problem is also in NP.
Let’s briefly mention two other classes of “hard” problems. NP-hard problems are the set of problems which are at least as hard as the hardest problems in NP. This means that if we were able to find an algorithm which runs in polynomial time for some problem which is NP-hard, then all problems in NP could be solved in polynomial time. A very important class of problems are NP-complete, which are the problems that are in both NP and NP-hard. Below are diagrams which show the relationship between these sets of problems, depending on whether P=NP or P≠NP.
Famous NP-complete problems
- Traveling salesman problem: Given a set of cities, the distance between every pair of cities, and a distance D, can a salesman travel to every city exactly once and return to where they started covering a distance at most D?
- Knapsack problem: Given a set of items, each with a weight and a value, and a carrying capacity, can a total value (the sum of the values of items carried) of V be carried without exceeding the carrying capacity?
- 3-coloring: Given a set of dots with lines connecting pairs of dots, is it possible to color the dots with three colors so that no pair of dots colored the same are connected?
It is widely believed that P≠NP. In fact, plenty of research is done under the assumption that P≠NP. If this is ever shown to be true, then we know a large number of problems which cannot be solved in polynomial time.
The impact of a proof that P=NP would of course be amazing theoretically, but its practical impact depends on the proof. A constructive proof would mean that a method or algorithm is created which could actually solve an NP-complete problem in polynomial time. To name a few of the positive consequences, protein structure prediction and several optimization problems would be solvable in polynomial time. On the down side, if we were able to solve the 3-SAT problem in polynomial time, most of our security systems involving cryptography would no longer be safe.