A solution.
Problem
Link to the full problem on Hackerrank.
You are given a table, Projects, containing three columns: Task_ID, Start_Date and End_Date. It is guaranteed that the difference between the End_Date and the Start_Date is equal to 1 day for each row in the table.
If the End_Date of the tasks are consecutive, then they are part of the same project. Samantha is interested in finding the total number of different projects completed.
Write a query to output the start and end dates of projects listed by the number of days it took to complete the project in ascending order. If there is more than one project that have the same number of completion days, then order by the start date of the project.
Sample Input
Sample Output
2015-10-28 2015-10-29
2015-10-30 2015-10-31
2015-10-13 2015-10-15
2015-10-01 2015-10-04
Explanation
The example describes following four projects:
Project 1: Tasks 1, 2 and 3 are completed on consecutive days, so these are part of the project. Thus start date of project is 2015–10–01 and end date is 2015–10–04, so it took 3 days to complete the project.
Project 2: Tasks 4 and 5 are completed on consecutive days, so these are part of the project. Thus, the start date of project is 2015–10–13 and end date is 2015–10–15, so it took 2 days to complete the project.
Project 3: Only task 6 is part of the project. Thus, the start date of project is 2015–10–28 and end date is 2015–10–29, so it took 1 day to complete the project.
Project 4: Only task 7 is part of the project. Thus, the start date of project is 2015–10–30 and end date is 2015–10–31, so it took 1 day to complete the project.
Full Solution (MySQL)
SELECT A.Start_Date, MIN(B.End_Date)
FROM
(SELECT Start_Date
FROM Projects
WHERE Start_Date NOT IN (SELECT End_Date FROM Projects)) AS A
INNER JOIN
(SELECT End_Date
FROM Projects
WHERE End_Date NOT IN (SELECT Start_Date FROM Projects)) AS B
WHERE A.Start_Date < B.End_Date
GROUP BY A.Start_Date
ORDER BY MIN(B.End_Date) - A.Start_Date ASC, A.Start_Date;
Solution Explanation
Visualising the problem’s given sample input and explanation:
Observation: a project’s Start_Date
is not in the End_Date
column and vice versa.
(1) Obtain projects’ start dates — Table A
SELECT Start_Date
FROM Projects
WHERE Start_Date NOT IN (SELECT End_Date FROM Projects);
Results:
(2) Obtain projects’ end dates — Table B
SELECT End_Date
FROM Projects
WHERE End_Date NOT IN (SELECT Start_Date FROM Projects);
Results:
(3) Match project start dates and end dates — INNER JOIN
SELECT A.Start_Date, MIN(B.End_Date)
FROM
(SELECT Start_Date
FROM Projects
WHERE Start_Date NOT IN (SELECT End_Date FROM Projects)) AS A
INNER JOIN
(SELECT End_Date
FROM Projects
WHERE End_Date NOT IN (SELECT Start_Date FROM Projects)) AS B
WHERE A.Start_Date < B.End_Date
GROUP BY A.Start_Date;
Results of the FROM
clause, before the SELECT
clause. (SQL executes the FROM
statement and any related joins before the SELECT
statement)
One project’s start date is has many end dates after the join, the correct end date for a particular start date is the minimum (earliest) of the End_Date
values that share the same Start_Date
.
For example, for Start_Date = 2015-10-01
, there are 4 end dates (2015-10-04, 2015-10-15, 2015-10-29, 2015-10-31), the End_Date
that corresponds to it is the least (earliest) of the End_Date =2015–10–04
. This logic informs the SELECT
statement: SELECT A.Start_Date, MIN(B.End_Date)
. The GROUP BY
clause is to enable the MIN
aggregation in the SELECT
clause.
(4) Apply required sorting
ORDER BY MIN(B.End_Date) - A.Start_Date ASC, A.Start_Date;
The problem requires the results to be sorted first by the number of days to complete the project in ascending order, then, by the start date of the project, presumedly in ascending order as well.
The number of days to complete the project can be calculated with MIN(B.End_Date) - A.Start_Date
.
Alternately, DATEDIFF(MIN(B.End_Date), A.Start_Date)
can be used to calculate the number of days required to complete each project.
Results: