Try to optimize the number of temporaries to 16 or fewer per shader. This limits the number of register transfers to and from main memory, which has higher latency costs. Check the instruction set assembly code output and look for spill count. Spills are a good opportunity for optimization as they reduce the number of operations that depend on high latency memory operations. This can be done in Intel GPA by selecting a shader and choosing to look at the machine code generated by the compiler.
If possible, move the declaration and assignment of a variable closer to where it will be referenced.
Weigh the options between full and partial precision on variables as this can store more values in the same space. Use caution when mixing partial precision with full precision in the same instruction as it may cause redundant type conversions.
Move redundant code that is common between branches out of the branch. This can reduce redundant variable duplication.
Avoid non-uniform access to constant buffer/buffer data. Non-uniform access requires more temporary registers to store data per SIMD lane.
Avoid control flow decision based on constant buffer data, as this forces the compiler to generate sub-optimal machine code. Instead, use specialization constants, or generate multiple specialized shader permutations.