**12.1 What are Selection Algorithms?**

Selection algorithm is an algorithm for finding the k th smallest/largest number in a list (also called as k th order statistic). This includes finding the minimum, maximum, and median elements. For finding the k th order statistic, there are multiple solutions which provide different complexities, and in this chapter we will enumerate those possibilities.

**12.2 Selection by Sorting**

A selection problem can be converted to a sorting problem.In this method, we first sort the input elements and then get the desired element. It is efficient if we want to perform many selections.

For example, let us say we want to get the minimum element. After sorting the input elements we can simply return the first element (assuming the array is sorted in ascending order). Now, if we want to find the second smallest element, we can simply return the second element from the sorted list.

That means, for the second smallest element we are not performing the sorting again. The same is also the case with subsequent queries. Even if we want to get k th smallest element, just one scan of the sorted list is enough to find the element (or we can return the k th -indexed value if the elements are in the array).

From the above discussion what we can say is, with the initial sorting we can answer any query in one scan, O(n). In general, this method requires O(nlogn) time (for sorting), where n is the length of the input list. Suppose we are performing n queries, then the average cost per operation is just nlogn/n=O(logn) . This kind of analysis is called amortized analysis.

**12.3 Partition-based Selection** **Algorithm**

For the algorithm check Problem-6. This algorithm is similar to Quick sort.

**12.4 Linear Selection Algorithm – Median of Medians Algorithm**

Refer to Problem-11.

**12.5 Finding the K Smallest Elements in Sorted Order**

For the algorithm check Problem-6. This algorithm is similar to Quick sort.

**12.6 Selection Algorithms:** **Problems & Solutions**

** Problem-1** Find the largest element in an array A of size n.

**Solution**: Scan the complete array and return the largest element.

` ````
```
void FindLargestInArray(int n, const int A[]) { int large =A[0];
for (int i=1;i<=n-1; i++) if(A[i]> large) large = A[i]; printf("Largest: %d", large);

Time Complexity – O(n). Space Complexity – O(1).

**Note:** Any deterministic algorithm that can find the largest of n keys by comparison of keys takes at least n -1 comparisons.

**Problem-2** Find the smallest and largest elements in an array A of size n.

** Solution**:

` ````
```
void FindSmallestAnd LargestInArray (int A[], int n) {
int small =A[0]; int large =A[0];
for (int i = 1; i <= n-1; i++) if(A[i] < small)
small =A[i];
else if(A[i]> large)
large = A[i];
printf("Smallest:%d, Largest:%d", small, large);

Time Complexity – O(n). Space Complexity – O(1). The worst-case number of comparisons is 2(n – 1).

**Problem-3** Can we improve the previous algorithms?

**Solution**: Yes. We can do this by comparing in pairs.

` ````
```
// n is assumed to be even. Compare in pairs. void FindWithPairComparison (int A[], int n) { int large = small = -1;
for (int i=0; i<=n-1; i = i + 2) { if(A[i] < A[i+1]) {
if(A[i] < small)
small = A[i];
if(A[i+1]> large)
// Incrementi by 2.
large = Ali + 1];
}
else{
if(A[i+1] < small)
small =A[i+1];
if(A[i]> large)
large = A[i];
}
printf("Smallest:%d, Largest:%d", small, large);

Time Complexity – O(n). Space Complexity – O(1).

Number of comparisons:

**Summary:**

**Note**: For divide and conquer techniques refer to Divide and Conquer chapter.

**Problem-4** Give an algorithm for finding the second largest element in the given input list of elements.

**Solution: Brute Force Method Algorithm**:

• Find largest element: needs n – 1 comparisons

• Delete (discard) the largest element

• Again find largest element: needs n – 2 comparisons

Total number of comparisons: n – 1 + n – 2 = 2n – 3

**Problem-5** Can we reduce the number of comparisons in Problem-4 solution?

**Solution**:** The Tournament method**: For simplicity, assume that the numbers are distinct and that n is a power of 2. We pair the keys and compare the pairs in rounds until only one round remains. If the input has eight keys, there are four comparisons in the first round, two in the second, and one in the last. The winner of the last round is the largest key. The figure below shows the method.

The tournament method directly applies only when n is a power of 2. When this is not the case, we can add enough items to the end of the array to make the array size a power of 2. If the tree is complete then the maximum height of the tree is logn. If we construct the complete binary tree, we need n – 1 comparisons to find the largest. The second largest key has to be among the ones that were lost in a comparison with the largest one. That means, the second largest element should be one of the opponents of the largest element. The number of keys that are lost to the largest key is the height of the tree, i.e. logn [if the tree is a complete binary tree]. Then using the selection algorithm to find the largest among them, take logn – 1 comparisons. Thus the total number of comparisons to find the largest and second largest keys is n + logn – 2.

**Problem-6** Find the k-smallest elements in an array S of n elements using partitioning method.

**Solution**: Brute Force Approach: Scan through the numbers k times to have the desired element. This method is the one used in bubble sort (and selection sort), every time we find out the smallest element in the whole sequence by comparing every element. In this method, the sequence has to be traversed k times. So the complexity is O(n × k).

** Problem-7** Can we use the sorting technique for solving Problem-6?

**Solution**: Yes. Sort and take the first k elements.

1. Sort the numbers.

2. Pick the first k elements.

The time complexity calculation is trivial. Sorting of n numbers is of O(nlogn) and picking k elements is of O(k). The total complexity is O(nlogn + k) = O(nlogn).

**Problem-8** Can we use the tree sorting technique for solving Problem-6?

**Solution**: Yes.

1. Insert all the elements in a binary search tree.

2. Do an InOrder traversal and print k elements which will be the smallest ones. So, we have the k smallest elements.

The cost of creation of a binary search tree with n elements is O(nlogn) and the traversal up to k elements is O(k). Hence the complexity is O(nlogn + k) = O(nlogn).

**Disadvantage:** If the numbers are sorted in descending order, we will be getting a tree which will be skewed towards the left. In that case, the construction of the tree will be 0 + l + 2 + … + (n– 1) which is O(n 2 ). To escape from this, we can keep the tree balanced, so that the cost of constructing the tree will be only nlogn.

**Problem-9** Can we improve the tree sorting technique for solving Problem-6?

**Solution**: Yes. Use a smaller tree to give the same result.

1. Take the first k elements of the sequence to create a balanced tree of k nodes (this will cost klogk).

2. Take the remaining numbers one by one, and

a. If the number is larger than the largest element of the tree, return.

b. If the number is smaller than the largest element of the tree, remove the largest element of the tree and add the new element. This step is to make sure that a smaller element replaces a larger element from the tree. And of course the cost of this operation is logk since the tree is a balanced tree of k elements.

Once Step 2 is over, the balanced tree with k elements will have the smallest k elements. The only remaining task is to print out the largest element of the tree.

Time Complexity:

1. For the first k elements, we make the tree. Hence the cost is klogk.

2. For the rest n – k elements, the complexity is O(logk).

Step 2 has a complexity of (n – k) logk. The total cost is klogk + (n – k) logk = nlogk which is O(nlogk). This bound is actually better than the ones provided earlier.

**Problem-10** Can we use the partitioning technique for solving Problem-6?

**Solution:** **Yes. **

**Algorithm**

1. Choose a pivot from the array.

2. Partition the array so that: A[low…pivotpoint – 1] <= pivotpoint <= A[pivotpoint + 1..high].

3. if k < pivotpoint then it must be on the left of the pivot, so do the same method recursively on the left part.

4. if k = pivotpoint then it must be the pivot and print all the elements from low to pivotpoint.

5. if k > pivotpoint then it must be on the right of pivot, so do the same method recursively on the right part.

The top-level call would be kthSmallest = Selection(1, n, k).

` ````
```
void Find Median(int A[], int alo, int ahi, int B[], int blo int bhi) {
amidalo + (ahi-alo)/2; amed = a[amid];
bmid=blo + (bhi-blo)/2;
bmed = b[bmid];
if( ahi alo+ bhi blo <4) {
}
Handle the boundary cases and solve it smaller problem in O(1) time. return;
else if(amed

Time Complexity: O(n 2 ) in worst case as similar to Quicksort. Although the worst case is the same as that of Quicksort, this performs much better on the average [O(nlogk) – Average case].

**Problem-11** Find the k th -smallest element in an array S of n elements in best possible way.

**Solution**: This problem is similar to Problem-6 and all the solutions discussed for Problem-6 are valid for this problem. The only difference is that instead of printing all the k elements, we print only the k th element. We can improve the solution by using the median of medians algorithm. Median is a special case of the selection algorithm. The algorithm Selection(A, k) to find the k th smallest element from set A of n elements is as follows:

Before developing recurrence, let us consider the representation of the input below. In the figure, each circle is an element and each column is grouped with 5 elements. The black circles indicate the median in each group of 5 elements. As discussed, sort each column using constant time insertion sor

Components in recurrence:

• In our selection algorithm, we choose m, which is the median of medians, to be a pivot, and partition A into two sets X and Y. We need to select the set which gives maximum size (to get the worst case).

• The time in function Selection when called from procedure partition. The number of keys in the input to this call to Selection is n/5 .

• The number of comparisons required to partition the array. This number is length(S), let us say n.

**Problem-12** In Problem-11, we divided the input array into groups of 5 elements. The constant 5 play an important part in the analysis. Can we divide in groups of 3 which work in linear time?

**Problem-13** As in Problem-12, can we use groups of size 7?

**Problem-14** Given two arrays each containing n sorted elements, give an O(logn)-time algorithm to find the median of all 2n elements.

**Solution**: The simple solution to this problem is to merge the two lists and then take the average of the middle two elements (note the union always contains an even number of values). But, the merge would be Θ(n), so that doesn’t satisfy the problem statement. To get logn complexity, let medianA and medianB be the medians of the respective lists (which can be easily found since both lists are sorted). If medianA == medianB, then that is the overall median of the union and we are done. Otherwise, the median of the union must be between medianA and medianB. Suppose that medianA < medianB (the opposite case is entirely similar). Then we need to find the median of the union of the following two sets:

So, we can do this recursively by resetting the boundaries of the two arrays. The algorithm tracks both arrays (which are sorted) using two indices. These indices are used to access and compare the median of both arrays to find where the overall median lies

` ````
```
int Selection (int low, int high, int k) { int pivotpoint; if(low==high)
}
return S[low];
else {
pivotpoint = Partition(low, high);
if(k == pivotpoint)
return S[pivotpoint]; //we can print all the elements from low to pivotpoint. else if(k < pivotpoint)
return Selection (low, pivotpoint - 1, k);
else return Selection (pivotpoint + 1, high, k);
void Partition (int low, int high) {
int i, j, pivotitem;
pivotitem S[low];
j-low;
for (i= low + 1; i <= high; i++)
if(S[i] < pivotitem) {
j++;
Swap S[i] and S[j];
}
pivotpoint = j;
Swap S[low] and S[pivotpoint];
return pivotpoint;

Time Complexity: O(logn), since we are reducing the problem size by half every time.

** Problem-15** Let A and B be two sorted arrays of n elements each. We can easily find the k th smallest element in A in O(1) time by just outputting A[k]. Similarly, we can easily find the k th smallest element in B. Give an O(logk) time algorithm to find the k th smallest element overall {i.e., the k th smallest in the union of A and B.

**Solution**: It’s just another way of asking Problem-14.

**Problem-16** Find the k smallest elements in sorted order: Given a set of n elements from a totally-ordered domain, find the k smallest elements, and list them in sorted order. Analyze the worst-case running time of the best implementation of the approach.

**Solution**: Sort the numbers, and list the k smallest. T(n) = Time complexity of sort + listing k smallest elements = Θ(nlogn) + Θ(n) = Θ(nlogn).

** Problem-17** For Problem-16, if we follow the approach below, then what is the complexity?

**Solution**: Using the priority queue data structure from heap sort, construct a min-heap over the set, and perform extract-min k times. Refer to the Priority Queues (Heaps) chapter for more details.

** Problem-18** For Problem-16, if we follow the approach below then what is the complexity? Find the k th -smallest element of the set, partition around this pivot element, and sort the k smallest elements.

**Solution**: T (n) = Time complexity of kth – smallest + Finding pivot + Sorting prefix = Θ(n) + Θ(n) + Θ(klogk) = Θ(n + klogk) Since, k ≤ n, this approach is better than Problem-16 and Problem-17. Problem-19 Find k nearest neighbors to the median of n distinct numbers in O(n) time. Solution: Let us assume that the array elements are sorted. Now find the median of n numbers and call its index as X (since array is sorted, median will be at n/2 location). All we need to do is select k elements with the smallest absolute differences from the median, moving from X – 1 to 0, and X + 1 to n – 1 when the median is at index m.

Time Complexity: Each step takes Θ(n). So the total time complexity of the algorithm is Θ(n).

**Problem-20** Is there any other way of solving Problem-19?

**Solution**: Assume for simplicity that n is odd and k is even. If set A is in sorted order, the median is in position n/2 and the k numbers in A that are closest to the median are in positions (n – k)/2 through (n + k)/2. We first use linear time selection to find the (n – k)/2, n/2, and (n + k)/2 elements and then pass through set A to find the numbers less than the (n + k)/2 element, greater than the (n – k)/2 element, and not equal to the n/ 2 element. The algorithm takes O(n) time as we use linear time selection exactly three times and traverse the n numbers in A once.

** Problem-21** Given (x,y) coordinates of n houses, where should you build a road parallel to x-axis to minimize the construction cost of building driveways?

**Solution**: The road costs nothing to build. It is the driveways that cost money. The driveway cost is proportional to its distance from the road. Obviously, they will be perpendicular. The solution is to put the street at the median of the y coordinates.

**Problem-22** Given a big file containing billions of numbers, find the maximum 10 numbers from that file.

**Solution**: Refer to the Priority Queues chapter.

**Problem-23** Suppose there is a milk company. The company collects milk every day from all its agents. The agents are located at different places. To collect the milk, what is the best place to start so that the least amount of total distance is travelled?

**Solution**: Starting at the median reduces the total distance travelled because it is the place which is at the center of all the places.