// do-file: Licensing_SOC_Assignment // task: Occupational Assignment for Public Licensing Agency database records // project: Workforce Data Quality Initiative Grant 2017: Occupation Assignment Engine // author: Thea Evans, Indiana Business Research Center // date: 8/15/2017 // stata version 14 // program setup // macro drop eliminates all global and local macros macro drop // set more eliminates having to continuously hit "more" in output set more off *************************************************************************************************** *************************************************************************************************** *Replace the file pathway in quotes with the location of the folder on your machine containing the files cd "C:\Occupation_Assignment\Licensing_SOC_matching\PLA_Knowledge_Transfer\Stata" *Replace the year variable with your 4 digit year of interest global year = 2013 *Replace the SOC code vintage year. This will be either 2000, 2010, or 2018 depending upon the year of interest global socyr = 2010 *Replace the number variable with the number of jobs per person that are contained in the wage records global jobnum = 3 *Indicate whether your licensing data contains licenses for bartenders. Type 1 for 'yes', or 0 for 'no' after the equal sign global bartender = 1 *Indicate whether your licensing data contains licenses for engineers. Type 1 for 'yes', or 0 for 'no' after the equal sign global engineer = 1 *Indicate whether your licensing data contains licenses for asbestos workers. Type 1 for 'yes', or 0 for 'no' after the equal sign global asbestos = 1 *Indicate whether your licensing data contains licenses for nurses and/or nurse aides. Type 1 for 'yes', or 0 for 'no' after the equal sign global nurse = 1 *Indicate whether your licensing data contains licenses for registered nurses. Type 1 for 'yes', or 0 for 'no' after the equal sign global regnurse = 1 *Indicate whether your licensing data contains licenses for nurse practitioners. Type 1 for 'yes', or 0 for 'no' after the equal sign global nursepract = 0 *Indicate whether your licensing data contains licenses for nurse prescriptive authority (APN Prescriptive Authority or CSR Prescriptive Authority) /// Type 1 for 'yes', or 0 for 'no' after the equal sign global rx = 1 *Indicate whether your licensing data contains licenses for cosmetologists. Type 1 for 'yes', or 0 for 'no' after the equal sign global cosmetologist = 1 ********************************************************************************************************* *This section processes the Licensing to SOC crosswalk and merges it to the OES staffing patterns for a /// particular year. The output is one record for each potential SOC code associated with an occupational /// license, and all of the NAICS industries associated with that SOC code for the year in question. *Import the OES staffing patterns data for the year of interest import excel nat4d_M${year}_dl.xls, firstrow *Rename the OCC_CODE variable to match the SOC code variable from the licensing crosswalk file rename OCC_CODE soc$socyr *Create a variable that contains both the SOC code and the NAICS code. This will be used in a later section /// to join the salary information for occuapation and industry combinations to the licensing data to determine /// which potential occupations for a person are a more likely match based upon the salary. egen socnaics = concat(soc${socyr} NAICS), p(|) *Cleaning up the estimated values that aren't available (*) or are above the threshold level (#) replace A_PCT10 = "" if A_PCT10=="*" replace A_PCT25 = "" if A_PCT25=="*" replace A_MEDIAN = "" if A_MEDIAN=="*" replace A_MEDIAN = "" if A_PCT75=="*" replace A_PCT90 = "" if A_PCT90=="*" replace A_PCT10 = "187201" if A_PCT10=="#" replace A_PCT25 = "187201" if A_PCT25=="#" replace A_MEDIAN = "187201" if A_MEDIAN=="#" replace A_PCT75 = "999999" if A_PCT75=="#" replace A_PCT90 = "999999" if A_PCT90=="#" *Converting the wage variables from string to numeric format foreach v of var A_PCT10 A_PCT25 A_MEDIAN A_MEDIAN A_PCT75 A_PCT90 { destring `v', replace } *Save the OES staffing patten data as a Stata file save nat4d_M${year}_dl.dta clear *Import the excel License crosswalk to SOC code import excel license_soc_crosswalk.xlsx, sheet("Sheet1") firstrow *Determine which year of SOC codes and titles you need, then drop the data for the other years, /// and remove any duplicate records for the year of interest keep license_type profession_name LICENSEID LICTITLE ONETCODE ONETTITLE soc$socyr title$socyr *Identify and drop duplicate records unab vlist : _all sort `vlist' tempvar duplicate quietly by `vlist': gen `duplicate' = cond(_N==1,0,_n) drop if `duplicate'>1 drop `duplicate' *Create an Split variable that will allow the dataset to be split into separate groups, /// which do not contain any repeating SOC codes by soc$socyr, sort: generate splitid = _n label var splitid "Split ID" *Save the License to SOC crosswalk file save license_soc_crosswalk.dta *Summarize the split variable to determine how many separate files are necessary to avoid repeating SOC codes sum splitid global max = `r(max)' *Split the data into the number of files according to the splitid tabulation. These will save to the working directory. *Loop through the file splits to join the industries of employment to each occupation for each license, and drop non-matches forvalues i = 1/$max { use license_soc_crosswalk if splitid == `i', clear merge 1:m soc$socyr using nat4d_M${year}_dl.dta, keepusing(NAICS TOT_EMP) generate(match) drop if match==2 save splitid`i' } *Append all the separate files together use splitid1, clear forvalues i = 2/$max { append using splitid`i', nolabel } *Drop the unecessary match variable drop match *Set up data to reshape the format from long to wide rename LICTITLE lictitle rename TOT_EMP totemp replace totemp = "" if totemp=="**" destring totemp, replace egen licsocgrp = group(profession_name lictitle soc$socyr) drop if soc$socyr=="" gsort +licsocgrp -totemp by licsocgrp: generate reshapeid = _n *Summarize the reshapeid variable to determine how many columns of NAICS codes will be created in the reshape process sum reshapeid global col = `r(max)' *Reshape format of data from long to wide, so that there is only one row per license and occupation combination. *Each row will contain multiple variables for the possible NAICS industries of employment associated with that /// license and occupation keep profession_name lictitle soc$socyr title$socyr NAICS licsocgrp reshapeid reshape wide NAICS, i(licsocgrp) j(reshapeid) *Save the dataset. This name will depend on the year of the staffing pattern data save License_NAICS${year}.dta clear ********************************************************************************************************* *This section works with the Licensing data, and determines whether a license was legally able to be used at any /// point during the year based upon the License status, the date of issue, and the expiration date. ************* *NOTE: The licensing data must be processed to fit the exact field specifications listed in the instructions file ************* *Import micro license records in csv format insheet using microlicensedata.csv, delimiter("|") *Label variables in Licensing dataset label var universal_id "Universal ID" label var profession_name "Professional licensing board issuing the license" label var license_type "License title" label var issue_date "Date of license issue" label var expiration_date "Date of license expiration" label var status_date "Date of license status update" label var license_status "License status" label variable licensernk "Condensed License Rank" label define licensernk 1 "Active" 2 "Conditional Active" 3 "Superceded" 4 "Not Practicing/Retired" /// 5 "Expired" 6 "Suspended" 8 "Not Awarded" 7 "Revoked" 9 "Pending Application" 99 "Unassigned", add label values licensernk licensernk *Creating license dates in the STATA format replace status_date = substr(status_date,1,10) gen statusdate = date(status_date, "YMD") format statusdate %d label var statusdate "Stata format status date" gen issuedate = date(issue_date, "YMD") format issuedate %d label var issuedate "Stata format issue date" gen expirationdate = date(expiration_date, "YMD") format expirationdate %d label var expirationdate "Stata format expiration date" *Generate variables to indicate whether a license was active during each quarter gen activemonth = 0 foreach mo in jan feb mar { replace activemonth = 1 + activemonth if licensernk==1 & issuedate<=td(28`mo'${year}) & expirationdate>=td(27`mo'${year}) | /// licensernk==2 & issuedate<=td(28`mo'${year}) & expirationdate>=td(27`mo'${year}) | /// licensernk==3 & issuedate<=td(28`mo'${year}) & expirationdate>=td(27`mo'${year}) | /// licensernk==4 & issuedate<=td(28`mo'${year}) & statusdate>=td(01`mo'${year}) & expirationdate>=td(27`mo'${year}) | /// licensernk==5 & issuedate<=td(28`mo'${year}) & statusdate>=td(01`mo'${year}) & expirationdate>=td(27`mo'${year}) | /// licensernk==6 & issuedate<=td(28`mo'${year}) & statusdate>=td(01`mo'${year}) & expirationdate>=td(27`mo'${year}) | /// licensernk==7 & issuedate<=td(28`mo'${year}) & statusdate>=td(01`mo'${year}) & expirationdate>=td(27`mo'${year}) } gen active${year}q1 = activemonth replace activemonth = 0 foreach mo in apr may jun { replace activemonth = 1 + activemonth if licensernk==1 & issuedate<=td(28`mo'${year}) & expirationdate>=td(27`mo'${year}) | /// licensernk==2 & issuedate<=td(28`mo'${year}) & expirationdate>=td(27`mo'${year}) | /// licensernk==3 & issuedate<=td(28`mo'${year}) & expirationdate>=td(27`mo'${year}) | /// licensernk==4 & issuedate<=td(28`mo'${year}) & statusdate>=td(01`mo'${year}) & expirationdate>=td(27`mo'${year}) | /// licensernk==5 & issuedate<=td(28`mo'${year}) & statusdate>=td(01`mo'${year}) & expirationdate>=td(27`mo'${year}) | /// licensernk==6 & issuedate<=td(28`mo'${year}) & statusdate>=td(01`mo'${year}) & expirationdate>=td(27`mo'${year}) | /// licensernk==7 & issuedate<=td(28`mo'${year}) & statusdate>=td(01`mo'${year}) & expirationdate>=td(27`mo'${year}) } gen active${year}q2 = activemonth replace activemonth = 0 foreach mo in jul aug sep { replace activemonth = 1 + activemonth if licensernk==1 & issuedate<=td(28`mo'${year}) & expirationdate>=td(27`mo'${year}) | /// licensernk==2 & issuedate<=td(28`mo'${year}) & expirationdate>=td(27`mo'${year}) | /// licensernk==3 & issuedate<=td(28`mo'${year}) & expirationdate>=td(27`mo'${year}) | /// licensernk==4 & issuedate<=td(28`mo'${year}) & statusdate>=td(01`mo'${year}) & expirationdate>=td(27`mo'${year}) | /// licensernk==5 & issuedate<=td(28`mo'${year}) & statusdate>=td(01`mo'${year}) & expirationdate>=td(27`mo'${year}) | /// licensernk==6 & issuedate<=td(28`mo'${year}) & statusdate>=td(01`mo'${year}) & expirationdate>=td(27`mo'${year}) | /// licensernk==7 & issuedate<=td(28`mo'${year}) & statusdate>=td(01`mo'${year}) & expirationdate>=td(27`mo'${year}) } gen active${year}q3 = activemonth replace activemonth = 0 foreach mo in oct nov dec { replace activemonth = 1 + activemonth if licensernk==1 & issuedate<=td(28`mo'${year}) & expirationdate>=td(27`mo'${year}) | /// licensernk==2 & issuedate<=td(28`mo'${year}) & expirationdate>=td(27`mo'${year}) | /// licensernk==3 & issuedate<=td(28`mo'${year}) & expirationdate>=td(27`mo'${year}) | /// licensernk==4 & issuedate<=td(28`mo'${year}) & statusdate>=td(01`mo'${year}) & expirationdate>=td(27`mo'${year}) | /// licensernk==5 & issuedate<=td(28`mo'${year}) & statusdate>=td(01`mo'${year}) & expirationdate>=td(27`mo'${year}) | /// licensernk==6 & issuedate<=td(28`mo'${year}) & statusdate>=td(01`mo'${year}) & expirationdate>=td(27`mo'${year}) | /// licensernk==7 & issuedate<=td(28`mo'${year}) & statusdate>=td(01`mo'${year}) & expirationdate>=td(27`mo'${year}) } gen active${year}q4 = activemonth drop activemonth *Label active license variables foreach qrt in q1 q2 q3 q4 { label var active${year}`qrt' "Months with active license in ${year}`qrt'" } *Drop records without an active license for the year egen dropflag = rowtotal(active${year}q1-active${year}q4) drop if dropflag==0 drop dropflag *Identify and drop duplicate records unab vlist : _all sort `vlist' quietly by `vlist': gen dup = cond(_N==1,0,_n) drop if dup>1 drop dup *Generate a license id per person by universal_id, sort: gen licenseperid = _n label var licenseperid "ID of active license per person" *Generate a total license id per person by universal_id, sort: gen licensetot = _N label var licenseperid "Total number of active licenses per person" *Save the dataset save microlicensedata${year}.dta *Perform pairwise join to merge the Licensing data from PLA with the occupations and industries for each license rename license_type lictitle sort lictitle joinby lictitle using License_NAICS${year}.dta, unmatched(master) *Save the dataset save LicenseOccNAICS${year}.dta clear ************************************************************************************************************* *This section merges the prepared License-SOC-NAICS dataset to the wage records based upon the universal id variable *Import the wage records and save as a Stata file insheet using wagerecs${year}.csv, delimiter("|") save wagerecs${year}.dta clear use LicenseOccNAICS${year}.dta *Merge the the wage record data to the Licensing data (PLA). *This is a pairwise join since it is possible for an individual to hold more than one license, a license /// can have more than one occupation associated with a license, and a license holder may not be employed in the /// state in which they hold an active license *NOTE: This type of joinby where no matching option is specified will automatically drop records for: /// 1. Individual active license holders that are not in the wage records for the year (not on a payroll) /// 2. Individuals in the wage records that are not active license holders for the year sort universal_id joinby universal_id using wagerecs${year}.dta *Check to see whether there is a match between the quarterly NAICS code in the wage record and the list of /// possible NAICS codes associated with the active license based upon the OES staffing patterns data forvalues j = 1/$jobnum { forvalues q = 1/4 { gen match${year}q`q'_j`j' = . tostring naics${year}q`q'_j`j', gen(naics${year}q`q'_j`j'st) gen naics${year}q`q'_j`j'ss = substr(naics${year}q`q'_j`j'st,1,4) gen naics${year}q`q'_j`j'4d = naics${year}q`q'_j`j'ss + "00" drop naics${year}q`q'_j`j'st naics${year}q`q'_j`j'ss foreach num of numlist 1/$col { // if there is a "invalid numlist" or "invalid syntax" when running this loop, it is likely due to the global col variable defined earlier being cleared from the session memory. You will need to reopen the License_NAICSYYYY.dta file and replace the $col text with the number of the last NAICSnnn variable. replace match${year}q`q'_j`j'=1 if naics${year}q`q'_j`j'!=. & naics${year}q`q'_j`j'4d==NAICS`num' } replace match${year}q`q'_j`j' = . if active${year}q`q'==0 label var match${year}q`q'_j`j' "License and NAICS match for Job`j' Q`q' ${year}" } } *Drop the NAICS variables now that the NAICS matches between the staffing patterns and wage records have been determined foreach num of numlist 1/$col { drop NAICS`num' } *Save the file save LicenseOccNAICS${year}.dta, replace *There is a known issue with people being overassigned as bartenders due to the number of people that previously worked at an occupation /// requiring an Alcoholic Beverage license and have since kept their license active. /// This loop will will limit the Industry and License match variables for the quarter and job. Some the occupations associated with an /// Alcoholic Beverage license (food service managers and first line supervisors of restaurant staff) have industry matches /// where no alcohol is served. Therefore, the license is not an adequate determiner for the occupation of employment within those industries. forvalues j = 1/$jobnum { forvalues q = 1/4 { replace match${year}q`q'_j`j' = . if ${bartender}==1 & match${year}q`q'_j`j'==1 & lictitle=="Employee Permit" & /// naics${year}q`q'_j`j'!=312120 & naics${year}q`q'_j`j'!=312130 & naics${year}q`q'_j`j'!=312140 & naics${year}q`q'_j`j'!=561320 & /// naics${year}q`q'_j`j'!=711110 & naics${year}q`q'_j`j'!=711190 & naics${year}q`q'_j`j'!=711211 & naics${year}q`q'_j`j'!=711212 & /// naics${year}q`q'_j`j'!=711219 & naics${year}q`q'_j`j'!=711310 & naics${year}q`q'_j`j'!=711320 & /// naics${year}q`q'_j`j'!=713210 & naics${year}q`q'_j`j'!=713290 & naics${year}q`q'_j`j'!=713910 & naics${year}q`q'_j`j'!=713920 & /// naics${year}q`q'_j`j'!=713950 & naics${year}q`q'_j`j'!=713990 & naics${year}q`q'_j`j'!=721110 & naics${year}q`q'_j`j'!=721120 & /// naics${year}q`q'_j`j'!=722310 & naics${year}q`q'_j`j'!=722320 & naics${year}q`q'_j`j'!=722330 & naics${year}q`q'_j`j'!=722410 & /// naics${year}q`q'_j`j'!=722511 & naics${year}q`q'_j`j'!=722513 & naics${year}q`q'_j`j'!=722514 & naics${year}q`q'_j`j'!=813410 } } *Suppress the Cosmetologist license for individuals that also possess a RN, LPN, or Nursing Aide license and work in /// Ambulatory Health Care Services (NAICS 621), Hospitals (NAICS 622), or Nursing and Residential Care Facilities (NAICS 623). forvalues q = 1/4 { gen cosmq`q' = 1 if lictitle=="Cosmetologist" & active${year}q`q'>=1 gen nrseq`q' = 1 if profession_name=="Nursing Board" & active${year}q`q'>=1 | profession_name=="Nurse Aides" & active${year}q`q'>=1 bysort universal_id (cosmq`q'): carryforward cosmq`q', replace bysort universal_id (nrseq`q'): carryforward nrseq`q', replace egen suppflagq`q' = rowtotal(cosmq`q' nrseq`q') replace suppflagq`q' = 0 if suppflagq`q'==1 replace suppflagq`q' = 1 if suppflagq`q'==2 } forvalues j = 1/$jobnum { forvalues q = 1/4 { tostring naics${year}q`q'_j`j', gen(naics${year}q`q'_j`j'x) gen naics${year}q`q'_j`j'3d = substr(naics${year}q`q'_j`j'x,1,3) destring naics${year}q`q'_j`j'3d, replace drop naics${year}q`q'_j`j'x replace match${year}q`q'_j`j' = . if suppflagq`q'==1 & naics${year}q`q'_j`j'3d==622 & lictitle=="Cosmetologist" & ${cosmetologist}==1 & ${nurse}==1 replace match${year}q`q'_j`j' = . if suppflagq`q'==1 & naics${year}q`q'_j`j'3d==623 & lictitle=="Cosmetologist" & ${cosmetologist}==1 & ${nurse}==1 replace match${year}q`q'_j`j' = . if suppflagq`q'==1 & naics${year}q`q'_j`j'3d==621 & lictitle=="Cosmetologist" & ${cosmetologist}==1 & ${nurse}==1 drop naics${year}q`q'_j`j'3d } } forvalues q = 1/4 { drop cosmq`q' nrseq`q' suppflagq`q' } *Identify persons that have similar active licenses that result in the same SOC assignment after matching /// the dominant NAICS for the quarter. 0 = not a duplicate or first instance of a duplicate SOC | 1+ = duplicate SOC forvalues j = 1/$jobnum { forvalues q = 1/4 { sort match${year}q`q'_j`j' universal_id soc${socyr} quietly by match${year}q`q'_j`j' universal_id soc${socyr}, sort: gen dupsocq`q'_j`j' = cond(_N==1,0,_n) replace dupsocq`q'_j`j' = dupsocq`q'_j`j'-1 if dupsocq`q'_j`j'>=1 label var dupsocq`q'_j`j' "Q`q' Job`j' Duplicate SOC code for NAICS matches" } } *Count of potential occupation codes based upon license and industry of employment for each quarter and each job forvalues j = 1/$jobnum { forvalues q = 1/4 { bysort match${year}q`q'_j`j' universal_id: generate indsocq`q'_j`j' = _N bysort match${year}q`q'_j`j' universal_id: egen numdupsq`q'_j`j' = count(dupsocq`q'_j`j') if dupsocq`q'_j`j'>=1 sort match${year}q`q'_j`j' universal_id numdupsq`q'_j`j' by match${year}q`q'_j`j' universal_id: carryforward numdupsq`q'_j`j', replace replace indsocq`q'_j`j' = indsocq`q'_j`j' - numdupsq`q'_j`j' if numdupsq`q'_j`j'!=. replace indsocq`q'_j`j' = . if match${year}q`q'_j`j'==. label var indsocq`q'_j`j' "Q`q' Job`j' Count of Industry-Occupation matches per person" } } *Drop the records with non-matching NAICS and SOC combinations for all quarters and all jobs egen NAICSmatch$year = rowtotal(match${year}q1_j1 match${year}q2_j1 match${year}q3_j1 match${year}q4_j1 /// match${year}q1_j2 match${year}q2_j2 match${year}q3_j2 match${year}q4_j2 /// match${year}q1_j3 match${year}q2_j3 match${year}q3_j3 match${year}q4_j3) drop if NAICSmatch$year==0 label var NAICSmatch$year "Number of Quarters and Jobs with License and NAICS match for the entire year" *Generate a possible SOC id per person bysort universal_id (licenseperid): generate socperid = _n *Create varaibles to identify potential occupation and NAICS combinations. will be used to join the salary data forvalues j = 1/$jobnum { forvalues q = 1/4 { egen socnaicsq`q'_j`j' = concat(soc${socyr} naics${year}q`q'_j`j'4d), p(|) replace socnaicsq`q'_j`j' = "" if naics${year}q`q'_j`j'==. | match${year}q`q'_j`j'==. drop naics${year}q`q'_j`j'4d } } *Join the salary information from the national staffing patterns to determine the salary for each SOC-NAICS combination drop _merge forvalues j = 1/$jobnum { forvalues q = 1/4 { rename socnaicsq`q'_j`j' socnaics joinby socnaics using nat4d_M${year}_dl.dta, unmatched(master) rename socnaics socnaicsq`q'_j`j' rename A_PCT10 annpct10q`q'_j`j' rename A_PCT25 annpct25q`q'_j`j' rename A_MEDIAN annmedq`q'_j`j' rename A_PCT75 annpct75q`q'_j`j' rename A_PCT90 annpct90q`q'_j`j' drop _merge NAICS NAICS_TITLE OCC_TITLE OCC_GROUP TOT_EMP EMP_PRSE PCT_TOTAL PCT_RPT H_MEAN A_MEAN MEAN_PRSE H_PCT10 H_PCT25 H_MEDIAN H_PCT75 H_PCT90 ANNUAL HOURLY } } *Generate a variable that represents the assigned SOC code for the quarter and job combination forvalues j = 1/$jobnum { forvalues q = 1/4 { gen socq`q'_j`j' = soc$socyr if indsocq`q'_j`j'==1 & match${year}q`q'_j`j'==1 gen soclvlq`q'_j`j' = 6 if indsocq`q'_j`j'==1 & match${year}q`q'_j`j'==1 } } *Working through different scenarios to assign a 5,4, or 3 digit SOC code when there are multiple potential SOC codes forvalues j = 1/$jobnum { forvalues q = 1/4 { *5-Digit gen SOC5dig = substr(soc$socyr,1,6) if match${year}q`q'_j`j'==1 by universal_id match${year}q`q'_j`j' (SOC5dig), sort: gen same5dig = SOC5dig[1] == SOC5dig[_N] gen SOC5digmch = 1 if match${year}q`q'_j`j'==1 & indsocq`q'_j`j'>=2 & same5dig==1 & SOC5dig!="" *4-Digit gen SOC4dig = substr(soc$socyr,1,5) if match${year}q`q'_j`j'==1 by universal_id (SOC4dig), sort: gen same4dig = SOC4dig[1] == SOC4dig[_N] gen SOC4digmch = 1 if match${year}q`q'_j`j'==1 & indsocq`q'_j`j'>=2 & same4dig==1 & SOC5digmch==. & SOC4dig!="" *3-Digit gen SOC3dig = substr(soc$socyr,1,4) if match${year}q`q'_j`j'==1 by universal_id (SOC3dig), sort: gen same3dig = SOC3dig[1] == SOC3dig[_N] gen SOC3digmch = 1 if match${year}q`q'_j`j'==1 & indsocq`q'_j`j'>=2 & same3dig==1 & SOC5digmch==. & SOC4digmch==. & SOC3dig!="" *Filling in the soc codes and soc level codes for records with 5, 4, or 3 digit soc matches replace soclvlq`q'_j`j' = 5 if socq`q'_j`j'=="" & SOC5digmch==1 replace socq`q'_j`j' = SOC5dig + "0" if soclvlq`q'_j`j'==5 replace soclvlq`q'_j`j' = 4 if socq`q'_j`j'=="" & SOC4digmch==1 replace socq`q'_j`j' = SOC4dig + "00" if soclvlq`q'_j`j'==4 replace soclvlq`q'_j`j' = 3 if socq`q'_j`j'=="" & SOC3digmch==1 replace socq`q'_j`j' = SOC3dig + "000" if soclvlq`q'_j`j'==3 gsort universal_id -soclvlq`q'_j`j' by universal_id: replace soclvlq`q'_j`j' = soclvlq`q'_j`j'[_n-1] if soclvlq`q'_j`j'[_n-1]!=. by universal_id: replace socq`q'_j`j' = socq`q'_j`j'[_n-1] if socq`q'_j`j'[_n-1]!="" drop SOC5dig same5dig SOC5digmch SOC4dig same4dig SOC4digmch SOC3dig same3dig SOC3digmch } } *Determine whether the annualized wage for a person falls within the 10%-90% salary range for the SOC_NAICS combination /// This is used to eliminate potential soc codes for a person that doesnt have a suitable 5, 4, or 3 digit code. It was not /// used before to eliminate soc codes because we can only assume that the person is working full time to determine the annual /// dominant wage. forvalues j = 1/$jobnum { forvalues q = 1/4 { gen wgmatchq`q'_j`j' = 1 if wg${year}q`q'_j`j'>=(annpct10q`q'_j`j'/4) & wg${year}q`q'_j`j'<=(annpct90q`q'_j`j'/4) & match${year}q`q'_j`j'==1 } } *Generate the possible number of SOC codes per person after suppressing SOC codes that don't fit into the 10th to 90th percentile wage range forvalues j = 1/$jobnum { forvalues q = 1/4 { by universal_id wgmatchq`q'_j`j', sort: generate wgsocidq`q'_j`j' = _N } } *Assign 6-digit codes for those records that only have one potential SOC code after suppressing the occupations that have /// ranges outside of the dominant wage for the record forvalues j = 1/$jobnum { forvalues q = 1/4 { replace socq`q'_j`j' = soc$socyr if socq`q'_j`j'=="" & wgmatchq`q'_j`j'==1 & wgsocidq`q'_j`j'==1 & match${year}q`q'_j`j'==1 replace soclvlq`q'_j`j' = 61 if soclvlq`q'_j`j'==. & wgmatchq`q'_j`j'==1 & wgsocidq`q'_j`j'==1 & match${year}q`q'_j`j'==1 by universal_id wgmatchq`q'_j`j', sort: gen same6dig = soc${socyr}[1] == soc${socyr}[_N] replace socq`q'_j`j' = soc$socyr if socq`q'_j`j'=="" & wgmatchq`q'_j`j'==1 & wgsocidq`q'_j`j'>=2 & same6dig==1 replace soclvlq`q'_j`j' = 61 if soclvlq`q'_j`j'==. & wgmatchq`q'_j`j'==1 & wgsocidq`q'_j`j'>=2 & same6dig==1 gsort universal_id -soclvlq`q'_j`j' by universal_id: replace soclvlq`q'_j`j' = soclvlq`q'_j`j'[_n-1] if soclvlq`q'_j`j'[_n-1]!=. by universal_id: replace socq`q'_j`j' = socq`q'_j`j'[_n-1] if socq`q'_j`j'[_n-1]!="" drop same6dig } } *After suppressing occupations that fall outside of the 10%-90% salary range, this will test to assign a 5,4, or 3 digit /// SOC code when there are multiple potential 6 digit SOC codes forvalues j = 1/$jobnum { forvalues q = 1/4 { *5-Digit gen SOC5digwg = substr(soc${socyr},1,6) if match${year}q`q'_j`j'==1 & wgmatchq`q'_j`j'==1 by universal_id wgmatchq`q'_j`j' (SOC5digwg), sort: gen same5digwg = SOC5digwg[1] == SOC5digwg[_N] gen SOC5digmchwg = 1 if match${year}q`q'_j`j'==1 & indsocq`q'_j`j'>=2 & same5digwg==1 & wgmatchq`q'_j`j'==1 & socq`q'_j`j'=="" & SOC5digwg!="" *4-Digit gen SOC4digwg = substr(soc${socyr},1,5) if match${year}q`q'_j`j'==1 & wgmatchq`q'_j`j'==1 by universal_id wgmatchq`q'_j`j' (SOC4digwg), sort: gen same4digwg = SOC4digwg[1] == SOC4digwg[_N] gen SOC4digmchwg = 1 if match${year}q`q'_j`j'==1 & indsocq`q'_j`j'>=2 & same4digwg==1 & wgmatchq`q'_j`j'==1 & socq`q'_j`j'=="" & SOC5digmchwg==. & SOC4digwg!="" *3-Digit gen SOC3digwg = substr(soc${socyr},1,4) if match${year}q`q'_j`j'==1 & wgmatchq`q'_j`j'==1 by universal_id wgmatchq`q'_j`j' (SOC3digwg), sort: gen same3digwg = SOC3digwg[1] == SOC3digwg[_N] gen SOC3digmchwg = 1 if match${year}q`q'_j`j'==1 & indsocq`q'_j`j'>=2 & same3digwg==1 & wgmatchq`q'_j`j'==1 & socq`q'_j`j'=="" & SOC5digmchwg==. & SOC4digmchwg==. & SOC3digwg!="" *Filling in the soc codes and soc level codes for records with 5, 4, or 3 digit soc matches replace soclvlq`q'_j`j' = 51 if socq`q'_j`j'=="" & SOC5digmchwg==1 replace socq`q'_j`j' = SOC5digwg + "0" if soclvlq`q'_j`j'==51 replace soclvlq`q'_j`j' = 41 if socq`q'_j`j'=="" & SOC4digmchwg==1 replace socq`q'_j`j' = SOC4digwg + "00" if soclvlq`q'_j`j'==41 replace soclvlq`q'_j`j' = 31 if socq`q'_j`j'=="" & SOC3digmchwg==1 replace socq`q'_j`j' = SOC3digwg + "000" if soclvlq`q'_j`j'==31 drop SOC5digwg same5digwg SOC5digmchwg SOC4digwg same4digwg SOC4digmchwg SOC3digwg same3digwg SOC3digmchwg gsort universal_id -soclvlq`q'_j`j' by universal_id: replace soclvlq`q'_j`j' = soclvlq`q'_j`j'[_n-1] if soclvlq`q'_j`j'[_n-1]!=. by universal_id: replace socq`q'_j`j' = socq`q'_j`j'[_n-1] if socq`q'_j`j'[_n-1]!="" } } *Generate variables to determine whether the annual dominant wage fits within the 25% - 75% wage range each quarter. /// This will help eliminate occupations that are not as suitable as other license/industry/wage/occupation matches. forvalues j = 1/$jobnum { forvalues q = 1/4 { drop wgmatchq`q'_j`j' gen wgmatchq`q'_j`j' = 1 if wg${year}q`q'_j`j'>=(annpct25q`q'_j`j'/4) & wg${year}q`q'_j`j'<=(annpct75q`q'_j`j'/4) & match${year}q`q'_j`j'==1 } } *Generate the possible number of SOC codes per person after suppressing SOC codes that dont fit into the 25th to 75th percentile wage range forvalues j = 1/$jobnum { forvalues q = 1/4 { drop wgsocidq`q'_j`j' by universal_id wgmatchq`q'_j`j', sort: generate wgsocidq`q'_j`j' = _N } } *Assign 6-digit codes for those records that only have one potential SOC code after suppressing the occupations that have /// ranges outside of the 25th to 75th percentile wage range for the record forvalues j = 1/$jobnum { forvalues q = 1/4 { replace socq`q'_j`j' = soc$socyr if socq`q'_j`j'=="" & wgmatchq`q'_j`j'==1 & wgsocidq`q'_j`j'==1 & match${year}q`q'_j`j'==1 replace soclvlq`q'_j`j' = 62 if soclvlq`q'_j`j'==. & wgmatchq`q'_j`j'==1 & wgsocidq`q'_j`j'==1 & match${year}q`q'_j`j'==1 by universal_id wgmatchq`q'_j`j', sort: gen same6dig = soc${socyr}[1] == soc${socyr}[_N] replace socq`q'_j`j' = soc$socyr if socq`q'_j`j'=="" & wgmatchq`q'_j`j'==1 & wgsocidq`q'_j`j'>=2 & same6dig==1 replace soclvlq`q'_j`j' = 62 if soclvlq`q'_j`j'==. & wgmatchq`q'_j`j'==1 & wgsocidq`q'_j`j'>=2 & same6dig==1 gsort universal_id -soclvlq`q'_j`j' by universal_id: replace soclvlq`q'_j`j' = soclvlq`q'_j`j'[_n-1] if soclvlq`q'_j`j'[_n-1]!=. by universal_id: replace socq`q'_j`j' = socq`q'_j`j'[_n-1] if socq`q'_j`j'[_n-1]!="" drop same6dig } } *After suppressing occupations that fall outside of the 25th to 75th percentile salary range, this will test to assign a 5,4, or 3 digit /// SOC codes when there are multiple potential 6 digit SOC codes forvalues j = 1/$jobnum { forvalues q = 1/4 { *5-Digit gen SOC5digwg = substr(soc${socyr},1,6) if match${year}q`q'_j`j'==1 & wgmatchq`q'_j`j'==1 by universal_id wgmatchq`q'_j`j' (SOC5digwg), sort: gen same5digwg = SOC5digwg[1] == SOC5digwg[_N] gen SOC5digmchwg = 1 if match${year}q`q'_j`j'==1 & indsocq`q'_j`j'>=2 & same5digwg==1 & wgmatchq`q'_j`j'==1 & socq`q'_j`j'=="" & SOC5digwg!="" *4-Digit gen SOC4digwg = substr(soc${socyr},1,5) if match${year}q`q'_j`j'==1 & wgmatchq`q'_j`j'==1 by universal_id wgmatchq`q'_j`j' (SOC4digwg), sort: gen same4digwg = SOC4digwg[1] == SOC4digwg[_N] gen SOC4digmchwg = 1 if match${year}q`q'_j`j'==1 & indsocq`q'_j`j'>=2 & same4digwg==1 & wgmatchq`q'_j`j'==1 & socq`q'_j`j'=="" & SOC5digmchwg==. & SOC4digwg!="" *3-Digit gen SOC3digwg = substr(soc${socyr},1,4) if match${year}q`q'_j`j'==1 & wgmatchq`q'_j`j'==1 by universal_id wgmatchq`q'_j`j' (SOC3digwg), sort: gen same3digwg = SOC3digwg[1] == SOC3digwg[_N] gen SOC3digmchwg = 1 if match${year}q`q'_j`j'==1 & indsocq`q'_j`j'>=2 & same3digwg==1 & wgmatchq`q'_j`j'==1 & socq`q'_j`j'=="" & SOC5digmchwg==. & SOC4digmchwg==. & SOC3digwg!="" *Filling in the SOC codes and SOC level codes for records with 5, 4, or 3 digit SOC matches replace soclvlq`q'_j`j' = 52 if socq`q'_j`j'=="" & SOC5digmchwg==1 replace socq`q'_j`j' = SOC5digwg + "0" if soclvlq`q'_j`j'==52 replace soclvlq`q'_j`j' = 42 if socq`q'_j`j'=="" & SOC4digmchwg==1 replace socq`q'_j`j' = SOC4digwg + "00" if soclvlq`q'_j`j'==42 replace soclvlq`q'_j`j' = 32 if socq`q'_j`j'=="" & SOC3digmchwg==1 replace socq`q'_j`j' = SOC3digwg + "000" if soclvlq`q'_j`j'==32 drop SOC5digwg same5digwg SOC5digmchwg SOC4digwg same4digwg SOC4digmchwg SOC3digwg same3digwg SOC3digmchwg gsort universal_id -soclvlq`q'_j`j' -socq`q'_j`j' by universal_id: replace soclvlq`q'_j`j' = soclvlq`q'_j`j'[_n-1] if soclvlq`q'_j`j'[_n-1]!=. by universal_id: replace socq`q'_j`j' = socq`q'_j`j'[_n-1] if socq`q'_j`j'[_n-1]!="" } } *Generate variables to show how far from the potential median salaries differ from the annualized quarterly wages forvalues j = 1/$jobnum { forvalues q = 1/4 { gen wgtomedq`q'_j`j' = abs((wg${year}q`q'_j`j'*4)-annmedq`q'_j`j') by universal_id, sort: egen wgmedrnkq`q'_j`j' = rank(wgtomedq`q'_j`j') by universal_id, sort: egen wgmedminq`q'_j`j' = min(wgtomedq`q'_j`j') by universal_id, sort: egen wgmedmaxq`q'_j`j' = max(wgtomedq`q'_j`j') } } ******************************************************************************************************************************** *This section deals only with resolving occupation assignment conflicts for certain license types. Therefore, the different /// sections of code will only run based upon the answers that you provided at the begining of the file that indicated whether /// these licenses are present in the dataset you are using. *For engineers without SOC assignments: /// If the engineering management occupation median wage is closest to the annualized wage of the individual's wage record, /// then it will be assigned for that quarter and job. Otherwise, the 3-digit (17-2000) engineering code will be assigned. forvalues j = 1/$jobnum { forvalues q = 1/4 { replace soclvlq`q'_j`j' = 63 if ${engineer}==1 & profession_name=="Engineer Board" & soclvlq`q'_j`j'==. & wgmatchq`q'_j`j'==1 & /// match${year}q`q'_j`j'==1 & soc${socyr}=="11-9041" & wgmedrnkq`q'_j`j'==1 replace socq`q'_j`j' = soc$socyr if ${engineer}==1 & profession_name=="Engineer Board" & soclvlq`q'_j`j'==63 & wgmatchq`q'_j`j'==1 & /// match${year}q`q'_j`j'==1 & soc${socyr}=="11-9041" & wgmedrnkq`q'_j`j'==1 replace soclvlq`q'_j`j' = 33 if ${engineer}==1 & profession_name=="Engineer Board" & soclvlq`q'_j`j'==. & /// match${year}q`q'_j`j'==1 & soc${socyr}=="11-9041" & wgmedrnkq`q'_j`j'!=1 replace socq`q'_j`j' = "17-2000" if ${engineer}==1 & profession_name=="Engineer Board" & soclvlq`q'_j`j'==33 & /// match${year}q`q'_j`j'==1 & soc${socyr}=="11-9041" gsort universal_id -soclvlq`q'_j`j' by universal_id: replace soclvlq`q'_j`j' = soclvlq`q'_j`j'[_n-1] if soclvlq`q'_j`j'[_n-1]!=. by universal_id: replace socq`q'_j`j' = socq`q'_j`j'[_n-1] if socq`q'_j`j'[_n-1]!="" } } *There are some known issues with Engineers being over assigned the 17-2171 (Petroleum Engineer) occupation due to its high salary range, /// which is similar to the salary range of 11-9041 (Architectural and Engineering Managers). /// Individuals holding Professional Engineering licenses with salaries that match the ranges associated with both Petroleum Engineers /// and Architectural and Engineering Managers (11-9041), will be assigned the 11-9041 occupation code UNLESS they work in one of the following industries: /// NAICS 211110 (Oil and Gas Extraction), NAICS 213112 (Support Activities for Oil and Gas Operations), NAICS 486000 (Pipeline Transportation), /// NAICS 221200 (Natural Gas Distribution), or NAICS 324000 (Petroleum and Coal Products Manufacturing) forvalues j = 1/$jobnum { forvalues q = 1/4 { replace soclvlq`q'_j`j' = 63 if ${engineer}==1 & profession_name=="Engineer Board" & socq`q'_j`j'=="17-2171" & wgmedrnkq`q'_j`j'<=2 & /// match${year}q`q'_j`j'==1 & soc${socyr}=="11-9041" & naics${year}q`q'_j`j'!=211111 & naics${year}q`q'_j`j'!=211112 & /// naics${year}q`q'_j`j'!=213111 & naics${year}q`q'_j`j'!=213112 & naics${year}q`q'_j`j'!=486110 & naics${year}q`q'_j`j'!=486210 /// & naics${year}q`q'_j`j'!=486910 & naics${year}q`q'_j`j'!=221210 & naics${year}q`q'_j`j'!=340110 & naics${year}q`q'_j`j'!=324121 /// & naics${year}q`q'_j`j'!=324122 & naics${year}q`q'_j`j'!=324191 & naics${year}q`q'_j`j'!=342199 replace socq`q'_j`j' = "11-9041" if ${engineer}==1 & profession_name=="Engineer Board" & socq`q'_j`j'=="17-2171" & wgmedrnkq`q'_j`j'<=2 & /// match${year}q`q'_j`j'==1 & soc${socyr}=="11-9041" & naics${year}q`q'_j`j'!=211111 & naics${year}q`q'_j`j'!=211112 & /// naics${year}q`q'_j`j'!=213111 & naics${year}q`q'_j`j'!=213112 & naics${year}q`q'_j`j'!=486110 & naics${year}q`q'_j`j'!=486210 /// & naics${year}q`q'_j`j'!=486910 & naics${year}q`q'_j`j'!=221210 & naics${year}q`q'_j`j'!=340110 & naics${year}q`q'_j`j'!=324121 /// & naics${year}q`q'_j`j'!=324122 & naics${year}q`q'_j`j'!=324191 & naics${year}q`q'_j`j'!=342199 gsort universal_id -soclvlq`q'_j`j' by universal_id: replace soclvlq`q'_j`j' = soclvlq`q'_j`j'[_n-1] if soclvlq`q'_j`j'[_n-1]!=. & ${engineer}==1 by universal_id: replace socq`q'_j`j' = socq`q'_j`j'[_n-1] if socq`q'_j`j'[_n-1]!="" & ${engineer}==1 } } *Working through the individuals that hold an Alcoholic Beverage license to determine the most likely occupation, by assigning /// the SOC code that has a median quarterly wage closest to the wage of the individual for the quarter and job in question. forvalues j = 1/$jobnum { forvalues q = 1/4 { replace soclvlq`q'_j`j' = 64 if ${bartender}==1 & profession_name=="Alcoholic Beverage" & soclvlq`q'_j`j'==. & /// match${year}q`q'_j`j'==1 & wgmedrnkq`q'_j`j'==1 & licenseperid==1 replace socq`q'_j`j' = soc$socyr if ${bartender}==1 & profession_name=="Alcoholic Beverage" & soclvlq`q'_j`j'==64 & /// match${year}q`q'_j`j'==1 & wgmedrnkq`q'_j`j'==1 & licenseperid==1 gsort universal_id -soclvlq`q'_j`j' by universal_id: replace soclvlq`q'_j`j' = soclvlq`q'_j`j'[_n-1] if soclvlq`q'_j`j'[_n-1]!=. by universal_id: replace socq`q'_j`j' = socq`q'_j`j'[_n-1] if socq`q'_j`j'[_n-1]!="" } } *Assign the Hazardous materials removal workers SOC code to the remaining Asbestos Worker license holders that are employed /// in the Remediation Services industry (NAICS 562910). forvalues j = 1/$jobnum { forvalues q = 1/4 { replace socq`q'_j`j' = soc${socyr} if ${asbestos}==1 & socq`q'_j`j'=="" & match${year}q`q'_j`j'==1 & lictitle=="Asbestos Worker" & /// soc${socyr}=="47-4041" & naics${year}q`q'_j`j'==562910 & wgsocidq`q'_j`j'>=2 replace soclvlq`q'_j`j' = 65 if ${asbestos}==1 & soclvlq`q'_j`j'==. & match${year}q`q'_j`j'==1 & lictitle=="Asbestos Worker" & /// soc${socyr}=="47-4041" & naics${year}q`q'_j`j'==562910 & wgsocidq`q'_j`j'>=2 gsort universal_id -soclvlq`q'_j`j' by universal_id: replace soclvlq`q'_j`j' = soclvlq`q'_j`j'[_n-1] if soclvlq`q'_j`j'[_n-1]!=. by universal_id: replace socq`q'_j`j' = socq`q'_j`j'[_n-1] if socq`q'_j`j'[_n-1]!="" } } *Working through scenarios where there are multiple licenses but one or more were not active for all three months in the quarter /// This will suppress the license that was active for less time in the quarter, and will assign the occupation associated with the /// other license(s) that have matches with NAICS indsutry and wages. forvalues j = 1/$jobnum { forvalues q = 1/4 { gen licsupp`q'_j`j' = 1 if socq`q'_j`j'=="" & licensetot>1 & active${year}q`q'<3 by universal_id licsupp`q'_j`j', sort: generate actsocidq`q'_j`j' = _N replace socq`q'_j`j' = soc$socyr if socq`q'_j`j'=="" & licsupp`q'_j`j'!=1 & actsocidq`q'_j`j'==1 & match${year}q`q'_j`j'==1 replace soclvlq`q'_j`j' = 66 if soclvlq`q'_j`j'==. & licsupp`q'_j`j'!=1 & actsocidq`q'_j`j'==1 & match${year}q`q'_j`j'==1 by universal_id actsocidq`q'_j`j', sort: gen same6dig = soc${socyr}[1] == soc${socyr}[_N] replace socq`q'_j`j' = soc$socyr if socq`q'_j`j'=="" & licsupp`q'_j`j'!=1 & actsocidq`q'_j`j'>=2 & same6dig==1 & match${year}q`q'_j`j'==1 replace soclvlq`q'_j`j' = 66 if soclvlq`q'_j`j'==. & licsupp`q'_j`j'!=1 & actsocidq`q'_j`j'>=2 & same6dig==1 & match${year}q`q'_j`j'==1 drop same6dig gsort universal_id -soclvlq`q'_j`j' by universal_id: replace soclvlq`q'_j`j' = soclvlq`q'_j`j'[_n-1] if soclvlq`q'_j`j'[_n-1]!=. by universal_id: replace socq`q'_j`j' = socq`q'_j`j'[_n-1] if socq`q'_j`j'[_n-1]!="" } } *After suppressing occupations with less than a full quarter of active status (when the individual has other licenses active for the full quarter) /// this will test to assign a 5,4, or 3 digit SOC codes when there are multiple potential 6 digit SOC codes forvalues j = 1/$jobnum { forvalues q = 1/4 { *5-Digit gen SOC5act = substr(soc${socyr},1,6) if match${year}q`q'_j`j'==1 & licsupp`q'_j`j'!=1 by universal_id licsupp`q'_j`j' (SOC5act), sort: gen same5act = SOC5act[1] == SOC5act[_N] gen SOC5mchact = 1 if match${year}q`q'_j`j'==1 & actsocidq`q'_j`j'>=2 & same5act==1 & licsupp`q'_j`j'!=1 & socq`q'_j`j'=="" & SOC5act!="" *4-Digit gen SOC4act = substr(soc${socyr},1,5) if match${year}q`q'_j`j'==1 & licsupp`q'_j`j'!=1 by universal_id licsupp`q'_j`j' (SOC4act), sort: gen same4act = SOC4act[1] == SOC4act[_N] gen SOC4mchact = 1 if match${year}q`q'_j`j'==1 & actsocidq`q'_j`j'>=2 & same4act==1 & licsupp`q'_j`j'!=1 & socq`q'_j`j'=="" & SOC5mchact==. & SOC4act!="" *3-Digit gen SOC3act = substr(soc${socyr},1,4) if match${year}q`q'_j`j'==1 & licsupp`q'_j`j'!=1 by universal_id licsupp`q'_j`j' (SOC3act), sort: gen same3act = SOC3act[1] == SOC3act[_N] gen SOC3mchact = 1 if match${year}q`q'_j`j'==1 & actsocidq`q'_j`j'>=2 & same3act==1 & licsupp`q'_j`j'!=1 & socq`q'_j`j'=="" & SOC5mchact==. & SOC4mchact==. & SOC3act!="" *Filling in the SOC codes and SOC level codes for records with 5, 4, or 3 digit SOC matches replace soclvlq`q'_j`j' = 56 if socq`q'_j`j'=="" & SOC5mchact==1 replace socq`q'_j`j' = SOC5act + "0" if soclvlq`q'_j`j'==52 replace soclvlq`q'_j`j' = 46 if socq`q'_j`j'=="" & SOC4mchact==1 replace socq`q'_j`j' = SOC4act + "00" if soclvlq`q'_j`j'==42 replace soclvlq`q'_j`j' = 36 if socq`q'_j`j'=="" & SOC3mchact==1 replace socq`q'_j`j' = SOC3act + "000" if soclvlq`q'_j`j'==32 drop SOC5act same5act SOC5mchact SOC4act same4act SOC4mchact SOC3act same3act SOC3mchact licsupp`q'_j`j' gsort universal_id -soclvlq`q'_j`j' by universal_id: replace soclvlq`q'_j`j' = soclvlq`q'_j`j'[_n-1] if soclvlq`q'_j`j'[_n-1]!=. by universal_id: replace socq`q'_j`j' = socq`q'_j`j'[_n-1] if socq`q'_j`j'[_n-1]!="" } } *This section will work through the records that pertain to Nursing Practitioners vs. Registered Nurses. Indiana does not have a specific /// license for Nursing Practitioners, but the state does issue two prescriptive authority licenses that indicate a nurse is more likely to be /// a Nurse Practitioner rather than a Registered Nurse. If a person has an active prescriptive authority license and an active registered nurse /// license, the Nurse Practitioner SOC assignment will be applied. forvalues j = 1/$jobnum { forvalues q = 1/4 { replace socq`q'_j`j' = soc${socyr} if profession_name=="Nursing Board" & licensetot>1 & soc${socyr}=="29-1171" & match${year}q`q'_j`j'==1 & /// soclvlq`q'_j`j'==4 & socq`q'_j`j'=="29-1100" & ${regnurse}==1 & ${nursepract}==0 & ${rx}==1 replace soclvlq`q'_j`j' = 67 if profession_name=="Nursing Board" & licensetot>1 & soc${socyr}=="29-1171" & match${year}q`q'_j`j'==1 & /// soclvlq`q'_j`j'==4 & socq`q'_j`j'=="29-1171" & ${regnurse}==1 & ${nursepract}==0 & ${rx}==1 gsort universal_id -soclvlq`q'_j`j' by universal_id: replace soclvlq`q'_j`j' = soclvlq`q'_j`j'[_n-1] if soclvlq`q'_j`j'==4 & profession_name=="Nursing Board" & licensetot>1 /// & socq`q'_j`j'=="29-1100" & ${regnurse}==1 & ${nursepract}==0 & ${rx}==1 by universal_id: replace socq`q'_j`j' = socq`q'_j`j'[_n-1] if socq`q'_j`j'=="29-1100" & soclvlq`q'_j`j'==67 & /// profession_name=="Nursing Board" & licensetot>1 & ${regnurse}==1 & ${nursepract}==0 & ${rx}==1 } } *Labeling the soclvl variable label def soclvl 6 "6-digit direct" 5 "5-digit direct" 4 "4-digit direct" 3 "3-digit direct" 61 "6-digit 10-90% wage match" /// 51 "5-digit 10-90% wage match" 41 "4-digit 10-90% wage match" 31 "3-digit 10-90% wage match" 62 "6-digit 25-75% wage match" /// 52 "5-digit 25-75% wage match" 42 "4-digit 25-75% wage match" 32 "3-digit 25-75% wage match" /// 63 "6-digit engineering wage tiebreak" 33 "3-digit engineering wage tiebreak" 64 "6-digit alcohol wage tiebreak" /// 65 "6-digit Asbestos tiebreak" 66 "6-digit partial quarter tiebreak" 67 "6-digit nurse practitioner tiebreak" /// 56 "5-digit partial quarter tiebreak" 46 "4-digit partial quarter tiebreak" 36 "3-digit partial quarter tiebreak" forvalues j = 1/$jobnum { forvalues q = 1/4 { label val soclvlq`q'_j`j' soclvl } } *Counting the number of SOC assignments per person per year egen assignj1 = rownonmiss(soclvlq1_j1 soclvlq2_j1 soclvlq3_j1 soclvlq4_j1) if $jobnum>=1 egen assignj2 = rownonmiss(soclvlq1_j2 soclvlq2_j2 soclvlq3_j2 soclvlq4_j2) if $jobnum>=2 egen assignj3 = rownonmiss(soclvlq1_j3 soclvlq2_j3 soclvlq3_j3 soclvlq4_j3) if $jobnum>=3 gen assigntot = assignj1 + assignj2 + assignj3 *Getting rid of all records so there is only one per individual by universal_id, sort: gen id=_n keep if id==1 *Dropping unnecessary variables forvalues j = 1/$jobnum { forvalues q = 1/4 { drop match${year}q`q'_j`j' dupsocq`q'_j`j' numdupsq`q'_j`j' indsocq`q'_j`j' socnaicsq`q'_j`j' annpct10q`q'_j`j' /// annpct25q`q'_j`j' annmedq`q'_j`j' annpct75q`q'_j`j' annpct90q`q'_j`j' wgmatchq`q'_j`j' wgsocidq`q'_j`j' /// wgtomedq`q'_j`j' wgmedrnkq`q'_j`j' wgmedminq`q'_j`j' wgmedmaxq`q'_j`j' actsocidq`q'_j`j' } } drop profession_name lictitle issue_date expiration_date status_date license_status licensernk date_of_birth statusdate /// issuedate expirationdate active${year}q1 active${year}q2 active${year}q3 active${year}q4 licenseperid licsocgrp /// soc${socyr} title${socyr} socperid id *Reshaping the data from wide to long reshape long naics${year}q1 naics${year}q2 naics${year}q3 naics${year}q4 wg${year}q1 wg${year}q2 wg${year}q3 wg${year}q4 /// socq1 socq2 socq3 socq4 soclvlq1 soclvlq2 soclvlq3 soclvlq4, i(universal_id) j(job) string replace job = "1" if job=="_j1" replace job = "2" if job=="_j2" replace job = "3" if job=="_j3" destring job, replace reshape long naics${year} wg${year} soc soclvl, i(universal_id job) j(quarter) string replace quarter = "1" if quarter=="q1" replace quarter = "2" if quarter=="q2" replace quarter = "3" if quarter=="q3" replace quarter = "4" if quarter=="q4" destring quarter, replace drop assignj1 assignj2 assignj3 rename naics${year} naics rename wg${year} wg rename NAICSmatch${year} NAICSmatchq gen year = ${year} order universal_id year job quarter naics wg soc soclvl licensetot NAICSmatchq assigntot save Licensing_SOC_Assignment${year}.dta