Machine Learning Marketplace on Bitcoin
We present a novel way to develop a decentralized machine learning (ML) marketplace on Bitcoin. Anyone can outsource a machine learning task by publishing a smart contract with a reward attached. Anyone who submits the best performing model will receive the reward via a blockchain transaction, without going through a centralized authority.
How ML Competitions Work on Bitcoin
Kaggle competitions are machine learning tasks made by Kaggle or other companies like Facebook and Microsoft. If you compete successfully, you can win monetary prizes, sometimes over a million dollars.
Similar to Kaggle competitions, ML competitions on Bitcoin consist of the following steps:
- The competition host prepares the data and a description of the problem, by deploying a smart contract. To avoid overfitting of training dataset, all submitted models are evaluated on an independent testing dataset, committed to beforehand.
- Participants downloads the training data from the contract and train their models off chain.
- Participants submit their model to the contract.
- The host reveals the testing dataset and pays the participant with the best model. This can occur when, for instance, he finds a fulfilling model, or the deadline is reached, or the maximal number of submissions are received.
The entire ML contest is encoded in a smart contract and regulated by it. Compared to traditional ML contests, there are salient advantages to run them on Bitcoin:
- Trustless/transparent: everything is open and public in the Bitcoin blockchain, there is no counterparty risk and no cheating.
- Support small-value competition: thanks to Bitcoin’s support of micropayment, even competitions with reward of less than 1 cent can be hosted, which is impossible before.
- No middleman: anyone can host competitions or submit machine learning models without permission from any centralized authority such as Kaggle.
Gender Classification Perceptron as an Example
To demonstrate how this works in practice, we use a perceptron to solve a simple classification problem as we have done before. The model yielding the small error when evaluated on the testing dataset wins and receives the prize, all enforced on chain.
Practical Considerations
If the host decides not to reveal the testing dataset, no modeller gets paid. One way to mitigate this issue is to add another spending condition (e.g., another public function in sCrypt) to claim the bounty after certain time, where all modellers split the bounty based on their accuracy on the training dataset.
Summary
An open and transparent marketplace for ML models will democratize ML models and lower costs in acquiring these models.