Full source code is available here.
This is the kind of problem that is beyond my current level of problem solving skills. After reading the problem statement you start thinking about possible approaches that all turns out to be impractical or unclear and the only thing you say to yourself is “I have no idea how to solve this problem” and then you start to feel bad and incompetent. You don’t need to, this is a very hard problem that requires an acute sense of observation and analysis. I hope that with some practice I will be able to solve complex problems like this during competitions.
This time I had to rely on the Problem Editorial to get my head around to the solution. Turns out that even understanding the solution is not easy 🙂 In this post, I created an easier to understand implementation that hopefully will make it clear how to solve the problem and teach you some valuable lessons.
Often one of the reasons of analysis paralysis is the inability to decompose the problem in smaller problems. Let’s start from the problem of calculating the score of a Cartesian tree and see what we can learn from it.
Score of a Cartesian Tree
Looking at the example in the problem statement it is immediately apparent that the root of the Cartesian tree is the minimum element of the sequence of numbers. This is the key main observation even if it might seem obvious because a Cartesian tree is a min heap.
The smallest number split the sequence in two parts (left and right). You then notice that the left tree correspond to the Cartesian tree of the left sequence and the right tree correspond to the Cartesian tree of the right sequence.
This suggest a recursive way to calculate the score of a Cartesian tree. The score is the sum of the score of the root (the distance between the two children of the root if they exist) and the score of the left and right tree respectively.
The following image should make this clear.
Sum for sequence with fixed minimum and partitions
Let’s now consider the problem of calculating the sum for the sequences that have the same fixed minimum and the same set of elements for the left and right partitions.
We can ask ourselves the following question: if we already knew the sum for sequences of length lower than n, could we solve this problem recursively?
The answer is yes.
We can use the recursive structure of Cartesian trees to decompose this problem in three parts.
The result is the sum of the following scores:
- The sum for all the possible left sequences
- The sum for all the possible right sequences
- The sum for all the possible root scores
This is the high level method.
For every permutation of elements in the right sequence, we need to assign all permutations of elements in the left sequence. This means that the sum for the left subtrees is given by the following method where p is the permutation.
For every permutation of elements in the left sequence, we need to assign all permutations of elements in the right sequence. This means that the sum for the right subtrees is given by the following method.
To calculate the sum for all the possible root scores is a bit more tricky. The children of the root correspond to the smallest element in the left sequence and the smallest element in the right sequence (if they exist). We can consider all the possible pairs of leftMinIndex and rightMinIndex and decompose the problem.
Now we need to calculate the sum for a sequence with a fixed root, a fixed left root and a fixed right root. The root score is the distance between the fixed right root and the left right root but we need to calculate how many possible sequences exist with this constraints. The possible left sequences with a fixed left root are the permutations of (minIndex – 1) while the possible right sequences with a fixed right root are the permutations of (n – 1 – minIndex – 1). So we need to multiple the root score by these two numbers.
Sum for sequence with fixed minimum
All right, I guess you might feel a bit lost at this point.
So far, we solved the problem only for sequences with fixed minimum and fixed partitions with the big assumption that we knew already the sum for sub-sequences of length less than n.
Let’s now expand our input space considering all the sequences with a fixed minimum.
We can do this easily just considering all the possible partitions we can do and multiply this value to the result of the previous sub-problem. The left sequence contains minIndex elements. We can choose any sequence of elements between n – 1 (we exclude the minimum). So the number of possible partitions is the combinations of n – 1 and minIndex.
Sum for sequence
There are N possible positions for the minimum value so we can calculate the sum of any sequence of length N calculating the sum for sequence with fixed minimum N times and adding the results together.
No, we still have a big assumption. We are assuming that we already know the sum for sequences of length less than n.
We can solve this problem efficiently using a technique called Dynamic Programming. We basically calculate all the sums in order starting from 1 and we cache the result so that it can be used to calculate the sum for larger sequences.
Now that we have all the sums we can solve the problem simply returning sum[n] % mod.
The PrecomputePermutations method store the first n factorials in the array p for efficient later use.
The PrecomputeCombinations method create the Pascal triangle to efficiently retrieve the combination n, k.
What about module arithmetic and performance?
The problem ask to return the result modulo mod. You probably noticed that I used the class BigInteger to store the result of the computation because classic integer types are too small to store the intermediate results. It is possible to solve the problem without using BigInteger but it requires to be more careful to avoid overflows and the final code is difficult to understand. We are lucky enough in C# to have the BigInteger class and the performance of BigInteger are not too bad. Of course, it is wise to be smart and try to reduce the number of digits where possible to increase performance.
For example, my first implementation did not use mod in PrecomputePermutations and it took 2.6 seconds in the worse case scenario (with n = 100). Simply doing a modulo operation in permutations reduced the time to less than 1 second in the worse case that is sufficient to pass all TopCoder system tests on time.
You can read this TopCoder forum post for more information about modulo arithmetic in programming: Calculating Large Numbers Modulo q.
I spent a lot of hours to think and learn the solution of this problem and finally write this post.
This is what I learnt:
- Use the BigInteger class in C# when the problem ask a result in modulo. In this way, you can avoid unnecessary performance optimizations and overflow errors. Remember that if you fail a single system test you get a score of zero in TopCoder!
- What a Cartesian tree is and some of its properties
- Precomputing permutations and combinations is a good idea
- How to decompose a problem in sub-problems
- How to identify recursive definitions and apply dynamic programming
It has being fun!