Data Science Interview Question at Dropbox

Tri Juhari
3 min readMar 27, 2021

--

Finding Difference Salary

Hello everyone, on this occasion I would like to share insights about solving problems from the Interview Question for the role data science at the Airbnb. In this problem, will be solved using the python programming language and PostgreSQL.

So in the case study there is a company that wants to find the difference in employee salaries between the marketing and engineering divisions, the data is stored in a company database including db_employee and db_dept. In the employee database (db_employee) contains data related to company employees, while the department database (db_dept) contains data related to departments or divisions in the company.

The following displays the db_employee database data row data consisting of 5 columns, namely id, first_name, last_name, salary, department_id.

Basis data employee

Furthermore, for the department database (db_dept) as shown below. The department database consists of 2 columns namely id and department.

After knowing the data related to the data that will be used, then look for the difference in salaries for the engineering and marketing department.

import pandas as pd 
import numpy as np

df = pd.merge(db_employee, db_dept, how ='left',
left_on=['department_id'], right_on=['id'])
df1 = df[df['department']== 'engineering']df_eng = df1.groupby('department')['salary'].max().reset_index(
name='eng_salary')
df2 = df[df['department'] == 'marketing']df_mkt = df2.groupby('department')['salary'].max().reset_index(name = 'mkt_salary')result = pd.DataFrame( df_mkt['mkt_salary'] - df_eng['salary'])
result.columns= ['salary_diff']
result

The Output from the merge process

df = pd.merge(db_employee, db_dept, how ='left',
left_on=['department_id'], right_on=['id']
)
print(df)

The merge process above is the same as when using SQL queries, namely by using the LEFT JOIN command.

The next process is filtering because it will only look for differences in salaries for employees of the marketing and engineering divisions, so the following command can be carried out.

df1 = df[df['department']== 'engineering']
df2 = df[df['department'] == 'marketing']
df1 ( engineering)
df2(marketing)

The next stage is to look for the difference in salary between the two divisions, but before that, the max value for each division is first searched after finding the difference between the highest salary for the two divisions.

df_eng = df1.groupby('department')['salary'].max().reset_index(
name='eng_salary')
df_mkt = df2.groupby('department')['salary'].max().reset_index(name = 'mkt_salary')

Here are the results of the command code above.

Gaji tertinggi di department engineering
Gaji tertinggi di department marketintg

Furthermore, the results of the differences between the two divisions are 48187–45787 = 2400 or can be searched using the command below.

result = pd.DataFrame( df_mkt['mkt_salary'] - df_eng['salary'])

--

--

No responses yet