Stop throwing money out the window and optimize your big data storage!
If you are using CSV file format for your big data storage, then you can save up to 95% of your storage and query costs by converting to Parquet.
All major cloud providers charge by amount of data stored, as well as amount of data scanned per query.
Apache Parquet vs. CSV Files
Data Lake Generator outputs data in the Apache Parquet format by default. Parquet is designed to support fast data processing on complex data. Parquet is a columnar, self-describing, compressed format which is optimized for speed and storage costs.
Comparing Parquet format to text format (csv, txt), you will use 85% less storage space on S3, query times are 30% faster, and overall cost is around 95% cheaper.
A Data Lake eliminates the need for a report server by shifting compute and storage of reporting analytics to the cloud
Advanced Analytics with Machine Learning capabilities are possible when you have a well-built Data Lake created by Data Lake Generator.