Data Science Interview Question at Dropbox
Finding Difference Salary
Hello everyone, on this occasion I would like to share insights about solving problems from the Interview Question for the role data science at the Airbnb. In this problem, will be solved using the python programming language and PostgreSQL.
So in the case study there is a company that wants to find the difference in employee salaries between the marketing and engineering divisions, the data is stored in a company database including db_employee and db_dept. In the employee database (db_employee) contains data related to company employees, while the department database (db_dept) contains data related to departments or divisions in the company.
The following displays the db_employee database data row data consisting of 5 columns, namely id, first_name, last_name, salary, department_id.
Furthermore, for the department database (db_dept) as shown below. The department database consists of 2 columns namely id and department.
After knowing the data related to the data that will be used, then look for the difference in salaries for the engineering and marketing department.
import pandas as pd
import numpy as np
df = pd.merge(db_employee, db_dept, how ='left',
left_on=['department_id'], right_on=['id'])df1 = df[df['department']== 'engineering']df_eng = df1.groupby('department')['salary'].max().reset_index(
name='eng_salary')df2 = df[df['department'] == 'marketing']df_mkt = df2.groupby('department')['salary'].max().reset_index(name = 'mkt_salary')result = pd.DataFrame( df_mkt['mkt_salary'] - df_eng['salary'])
result.columns= ['salary_diff']
result
The Output from the merge process
df = pd.merge(db_employee, db_dept, how ='left',
left_on=['department_id'], right_on=['id'])
print(df)
The merge process above is the same as when using SQL queries, namely by using the LEFT JOIN command.
The next process is filtering because it will only look for differences in salaries for employees of the marketing and engineering divisions, so the following command can be carried out.
df1 = df[df['department']== 'engineering']
df2 = df[df['department'] == 'marketing']
The next stage is to look for the difference in salary between the two divisions, but before that, the max value for each division is first searched after finding the difference between the highest salary for the two divisions.
df_eng = df1.groupby('department')['salary'].max().reset_index(
name='eng_salary')df_mkt = df2.groupby('department')['salary'].max().reset_index(name = 'mkt_salary')
Here are the results of the command code above.
Furthermore, the results of the differences between the two divisions are 48187–45787 = 2400 or can be searched using the command below.
result = pd.DataFrame( df_mkt['mkt_salary'] - df_eng['salary'])