The simplest way of implementing the CRC algorithm in software is to take the hardware implementation and write the appropriate code, replacing the shift register with a variable and the XOR gates with the xor operator. This method is the easiest to implement and consumes very little memory. It's performance, however is poor.
When the time of the CRC process is of importance, faster software algorithms should be used. Those algorithms process one byte at a time and not one bit at a time like the hardware implementation does. Those algorithms have one main disadvantage: they have to keep tables in memory and so need more memory than the bitwise algorithm does.