In this work we present a runtime threading system which provides an efficient substrate for fine-grain parallelism, suitable for deployment in multicore platforms. Its architecture encompasses a number of optimizations that make it particularly effective in managing a large number of threads and with low overheads. The runtime system has been integrated into an OpenMP implementation to allow for transparent usage under a high level programming paradigm. We evaluate our implementation on two multicore systems using synthetic microbenchmarks and a real-time face detection application.