At all times-on machine studying fashions require a really low reminiscence and compute footprint. Their restricted parameter rely limits the mannequin’s capability to study, and the effectiveness of the same old coaching algorithms to seek out one of the best parameters. Right here we present {that a} small convolutional mannequin will be higher educated by first refactoring its computation into a bigger redundant multi-branched structure. Then, for inference, we algebraically re-parameterize the educated mannequin into the single-branched type with fewer parameters for a decrease reminiscence footprint and compute price. Utilizing this system, we present that our always-on wake-word detector mannequin, RepCNN, supplies an excellent trade-off between latency and accuracy throughout inference. RepCNN re-parameterized fashions are 43% extra correct than a uni-branch convolutional mannequin whereas having the identical runtime. RepCNN additionally meets the accuracy of complicated architectures like BC-ResNet, whereas having 2x lesser peak reminiscence utilization and 10x sooner runtime.