SAN FRANCISCO, Sept. 12, 2024 (GLOBE NEWSWIRE) -- Iterative, the company focused on simplifying AI workflows, today revealed that DataChain, their open-source tool for AI-based data processing released just last month, has gained significant traction among developers, surpassing 600 stars on GitHub.
DataChain marks a paradigm shift in how the industry handles unstructured data, addressing the critical need for a foundational layer in the modern data stack that operates without reliance on traditional SQL-based data warehouses.
According to a 2022 report by Komprise, 87% of IT leaders prioritize managing unstructured data growth, a significant increase from 70% in 2021. Moreover, IBM highlights that unstructured data comprises over 80% of all enterprise data, with 95% of businesses placing a premium on unstructured data management. As the world moves towards a future where IDC predicts 80% of global data will be unstructured by 2025, the need for innovative solutions like DataChain has never been more urgent.
“DataChain represents a fundamental shift in how AI and data engineering teams can manage and operationalize unstructured data at scale,” said Dmitry Petrov, founder and CEO, Iterative. “The post-modern data stack has long lacked a robust backbone for handling unstructured data. DataChain fills that void by enabling AI-driven data processing entirely within the Python ecosystem, eliminating the need for SQL islands and complex workarounds.”
DataChain enables directors of data, data engineers, machine learning engineers, and AI engineers to unlock the potential of their company’s vast unstructured data. It offers powerful tools for curating, managing, and operationalizing data for large language models (LLMs), computer vision (CV), and multimodal applications, transforming proof-of-concept projects into valuable, scalable product solutions.
Iterative is building a long waitlist of interest in both DataChain and the newly improved DVC Studio. DVC open-source usage is growing by 24.8% since the third-quarter last year, reflecting the increasing demand for robust data management in the AI space. Additionally, Iterative continues to see a steady influx of external contributions to its repositories, including significant ones from leading industry players like Hugging Face.
Key takeaways include:
- Unlock the Potential of Unstructured Data: DataChain empowers organizations to manage unstructured data effectively, unlocking insights and driving solutions across various applications.
- A Paradigm Shift in Data Management: DataChain introduces a new approach to managing the vast amounts of data needed for generative AI, revolutionizing how businesses handle their information.
- From POC to Production: With DataChain, companies can seamlessly transition from toy or proof-of-concept projects to value-driven internal and external product solutions.
DVC brings agility, reproducibility, and collaboration into the existing data science workflow. DVC provides users with a Git-like interface for versioning data and models, bringing version control to machine learning and solving the challenges of reproducibility. DVC is built on top of Git, allowing users to create lightweight metafiles and enabling the system to handle large files, rather than storing them in Git. It works with remote storage for large files in the cloud or on-premise network storage.
To learn more about DataChain and join the growing community of developers, visit DataChain on GitHub.
About Iterative
Founded in 2018, Iterative creates developer tools for AI engineers. The company has recorded more than 20M downloads for its open-source software DVC and earned more than 18,000 stars on GitHub. Iterative now has more than 400 contributors across its different tools and over 20 customers in their enterprise SaaS including F500 companies like UBS. Iterative is backed by True Ventures, Afore Capital, and 468 Capital. For more information, visit dvc.ai.
Media Contact:
Joe Eckert, jeckert@eckertcomms.com
Ray George, ray@eckertcomms.com