Skip to content

Latest commit

 

History

History
166 lines (94 loc) · 31.4 KB

4 - Between feasible and unfeasible.md

File metadata and controls

166 lines (94 loc) · 31.4 KB

Between feasible and unfeasible

This is another key concept: we want to look at what happens between the feasible and unfeasible. We labeled them by their presence inside P (the good ones), or their absence (the bad ones). These are non-trivial classes, but what happens inbetween?

P is the embodyment of tractable (i.e. feasible) problems, while EXP (containing the whole P) contains problem that cannot be solved in polynomial time too. So, what we're looking for are classes of an hypothetical A:

Is there anything interesting in this grey land of problems which are known not to be in P, but trivially in EXP? We'll discover that there's a lot to be done, and a lot of interesting properties to be discovered. Many of the problems we see in AI are in this class.

The NP Class

The first concept is the dicotomy between creating and verifying. Very often, the language we'd like to classify can be written as follows:

The language is the language of all strings for which there exists a certificate y such that , seen as a pair, lies in a set of pairs of . If we want to conclude that is in , we characterize it by another language A (which is a set of pairs of strings), using a certificate (which exists). The other crucial part is , which is a polynomial function. The certificate which certifies the presence of in must be sufficiently. Think about as a test about the fact that is an appropriate certificate for . Sometimes this language is itself decidable in polynomial time. Does this imply that itself is decidable in polynomial time?

The point is that one way of checking whether is in is looking for certificates. If we find a certificate the problem is solved! If we cannot find a certificate which satisfies the test, and if we check all certificates we can say that is not in . The point is there's too many certificates: the strings of length are exponentially many. Checking all of them is too expensive. Not necessarily, given , we can check all possible (smaller than ). This is not the only way though. We could follow alternative rules.

There is a dicotomy between crafting a solution for the problem, and checking a candidate solution to be an actual solution. Finding could be much more difficult than checking if is a solution. Looking for the appropriate certificates cannot be done in polynomial time. Checking is easier. Crafting requires some sort of creativity, which algorithms generally do not have.

Modern definition of NP

And that's it. The complexity class NP can be defined as the set of all languages L for which there exists a polynomial and a polynomial time TM such that

We want M to return 1 when fed with the pair . NP is the class of languages for which checking certificates can be checked in polynomial time. This doesn't mean that we can find a solution in polynomial time, rather check certificates.

So, M can be said verifier for . The class NP does not have a natural counterpart. It is a class of languages, and it's important for the definition that is a language. Otherwise, it's not so easy to think about it as a generalization.

A theorem on which we will spend some time is the following: NP is one example of a class which is between P and EXP:

Examples of NP languages include Maximum Independent Set, Subset Sum, Composite Numbers (actually in P), Factoring, Decisional Linear Programming (actually in P), Decisional 0/1 Linear Programming.

Original definition of NP

The class NP can also be defined using a variant of Turing machines, called the NonDeterministic Turing Machines or NDTM (this is the original definition and is the reason for the N in NP: non-deterministic). A NDTM has and additional state and two transition functions and instead of one and at each step one of them is chosen non-deterministically (currently only theoretical, not implementable).

We say that a NDTM accepts an input iff a possible evolution of with input which reaches .

runs in time iff for every and for every possible nondeterministic evolution reaches or within steps with .

For every and we say that iff there is a NDTM working in time such that iff .

Some problems are known to be in or outside of . For example, there is no polynomial time algorithm for the maximum independent set problem.

So, could we just stop here and say that all the problems in NP and not known to be in P are equivalent? They could be but nobody knows. In fact, complexity theory has done much around these, finding a way of classifying problems in and find those in which difficulty is maximal.

Generally, any problem is maximal or not. To better understand this concept, we'll have to introduce nondeterministic Turing Machines.

These are the way in which were originally defined. The in stands for that.

Now, what does nondeterministic actually mean? Definitely, there is just one difference: there are two transition functions, and the machine chooses a random one at every step. The choice is non-deterministic. There is no furthere information on how the transition function is chosen. This is not a realistic computational model: choosing the and is something the machine cannot do. Nondeterminism doesn't really exist in computer science. The machine can evolve in two different ways, making these TM not so realistic.

We assume that there's one state which we use to state if a NDTM accepts the input iff there exist one among the many possible evolutions of the machine when fed with makes it reach . In a deterministic machine, there is just one possible evolution, while now the evolutions become a binary tree. There just has to be one evolution. If none exists, the input is rejected.

This existance condition is connected with the concept of certificates: choosing a transition function is similar to finding a certificate.

We say that NDTM M runs in time iff for every and for every possible nondeterministic evolution, M reaches either the halting state or within steps.

We have lifted all the definitions we had to NDTM.

Reductions

A reduction is just a mapping from a binary string to another binary string. Such a reduction could be a way of comparing difficulty of languages.

We'd want a function fo to turn a element inside to an element inside . It may be that all the elements of L are mapped to the same element of H. It is very rare, but that may happen.

A problem can be NP-Hard or NP-complete. It is the first case if m

It is in HP-hard if its is as hard for every language in , and complete if is in . NP-hardness does not imply NP-completeness, while the inverse is true.

NP completeness is a stronger assumption: we also want the problem to be in .

IF we were to describe NP complete problems, it would not be a subset of NP, but just have an intersection. If a problem is NP-Hard, translating any string to is feasible.

The first interesting example of an NP-complete problem is simulating a Turing Machine. We introduce TMSAT, a language composed of quadruples, where the first component is a turing machine, hte second its input, the third a string of length which value doesn't matter, and the fourth be a string of length . We are interested in those quadruples such that there exists a certificate such that the machine alpha outputs 1 on an input which is computed from but also from . We are kind of checking if the TM indexed by outputs 1 on the right input, within a time bound. If there weren't any bounds, the problem would be quite complicated, close to the halting problem. We therefore explicitly state that we want the machine to halt in steps. It's clear that we are doing something that is hard: we are encoding all TM. This is not so useful, as it can't prove other problems to be NP complete.

Cook Levin theorem

We want to take any language in NP, and we want to show that is than SAT.

The machine tests certificates. Then, we define a polynomial time transformation, which given a string returns a boolean formula, the encoding of a CNS such that the output formula is satisfied iff the closing language . We need to dig into M, and understand that even if it is a machine can be simulated by logic. Even if the machine is automatic, it works by logic, i.e. it checks whether the input is a certain symbol, the state is something, and evolve. There's a way to encode this in propositional logic, being quite laborious but very elegant.

Suppose we're studying a language L, and we're convinced that the underlying problem is hard. How could we prove this? Proving that L is in EXP could be easy, but this doesn't tell us anything about the difficulty of L: knowing that L is in EXP doesn't mean much. We could prove that L is in NP, but again this doesn't mean much. The way to proceed is trying to prove that L is NP-complete. The NP-complete problems are at least as hard as problems in NP.

We then have to prove two statements: that L is in NP (typically easy) and that any other language in NP can be reduced to L. We could prove this directly, but it is quite difficult. We more often prefer proving that a certain language I, which is already known to be NP-complete (for example SAT) is smaller or equal to L. If you do so, that is fine: the relation is transitive, meaning that you can go through I.

Proving a problem hard

There's this language that we want to prove is hard. We could try to prove that it is in P, but it won't be that helpful: probably it isn't, as it might be easy if so. We might try to prove it to be in EXP. This, though, doesn't correspond to its hardness. The fact that there's an exponential algorithm doesn't mean much: a really efficient algorithm might not be the only one, as there might be efficient ones. This way, we could always use a stupid algorithm to prove a problem hard. Finally, we may prove that is NP-complete: this way we can prove that the problem is not so hard (being in NP) but not so easy either.

So, how do we do that? We have to prove two statements, the first one being that is in NP (as we know, this means we need to find a polynomial P and a TM working in polynomial time such that the problem is solved by it). The second step is trying to prove that any other language in NP is lower or equal than . We therefore have to find a third language between and which can itself be reduced to we can conclude that any language can: the relation is transitive.

Of course, if the problem is NP-hard we can always find that language. This is more about creativity than computing. Coming up with the right problem and the right reduction might be difficult. That's why we're rarely asked to solve these problems in exams, and when we are, the problem is suggested.

For example, the Maximum Independent Set problem is NP-complete.

Even the subset sum problem is NP-complete. The problem is trivial in NP (the certificate is just a subset), but we can also prove that the problem is NP-complete, so it's not in the easy side of the class.

This turns into the graph of NP-complete: we can start by stipulating that a pair of NP-complete problems, then go on with the others. For example, we know SAT to be NP-complete, then we can go to 3SAT, then we can go from SAT to INDSET and ILP. To reach SUBSETSUM, we go through OL3SET. This graph is giant: lots of problems are known to be NP-complete!

We would be happy if in some way we could solve these problems efficiently, even though they are NP-complete. Is the hope completely lost? Actually no! We know that no polynomial time for is available, because P would be equal to NP then. Once we have a problem which is NP-complete, and in particular is NP, we can try to reduce the problem to SAT, and since it is NP-complete we can always do it. Of course, this reduction can always be done, and it allows us to map strings in to strings in SAT. There are tools that solve SAT problems, called SAT solvers which take in input a CNF and decide if it's solvable. There are lots of problems in which SAT solvers work real well.

NP-hardness and NP-completeness

The language is said to be polynomial-time reducible to another language () iff there is a poly-time computable function such that . is a pre-order (reflexive and transitive).

For classes P or above it and , then is at least as difficult as .

A language is said to be:

  • NP-hard if . This means that it is at least as hard as any language in NP. Simplifying it means that it cannot be too easy. It could also be un-computable, NP-complete or outside NP.
  • NP-complete if is NP-hard. Note that NP-hardness does not imply NP-completeness as a NP-hard language may be un-computable or outside NP.

Note that:

  • is NP-hard .
  • is NP-complete .

No such language has been found and has not yet been proven (famous P vs NP problem).

Venn diagram of P and NP

Proving that a problem is NP-complete proves that the problem is not so hard (being in NP), but not so easy either (unless P = NP).

If we want to prove to be NP-complete we have to prove two statements:

  • is NP (see above)
  • for any other language . Since is transitive, we can simply prove that, where is a language already known as NP-complete.

The Cook-Levin Theorem

A kCNF (k-Conjunctive Normal Form) is a propositional formula which is a conjunction of disjunctions ("clauses") which contain at most literals.

The following languages are NP-complete:

This is a relevant proof of existence of NP-complete problems.

Previous section · Next section