Training AI: Where Does the Data Come From? How to Build a Data Economy with Web3? ft. Tammy Yang, Founder of Numbers Protocol
EP.225
GM,
Recently, Academia Sinica released a large-scale Chinese language model called CKIP-Llama-2-7b. However, due to the initial lack of clarification by Academia Sinica, it was mistakenly assumed by the public that this model was intended for general use in Traditional Chinese. This misunderstanding led to incorrect expectations and controversies.
Though it was a misunderstanding, it made many people realize that AI is not solely dependent on external resources, and Taiwan needs its own localized large language models. However, building large language models in Taiwan requires localized data. The process of data preparation itself is labor-intensive, and even determining the data format to align with international standards presents challenges.
In this episode of Blocktrend, we have Tammy Yang, co-founder of Numbers Protocol, as our guest. Tammy's background is quite unique – she is a physicist who also founded the AI startup DT42 before establishing Numbers. She is a true expert with hands-on experience in both AI and blockchain. Unlike most online discussions about blockchain and AI, it focuses on cryptocurrency investments, this episode offers a different perspective, emphasizing that data inherently serves as a digital asset.
This episode includes:
What problems do DT42 and Numbers Protocol, two AI and blockchain startups, respectively, aim to solve?
As the founders of AI and blockchain startups, what is the relationship between AI and blockchain?
How does data flow in Web2, and what are the differences in the data economy in Web3?
Taking the controversy surrounding Academia Sinica's release of LLM as an example, what phenomena do you observe, and how are they related to Web3?
What does "open source standards suitable for small data sets" mean, and what differences might arise between small and large countries in AI development?
Additional Reading: