
Google Cloud : Leveraging Unstructured Data with Dataproc & GCP
-
Explored methods to extract insights from unstructured data sources such as logs, text, images, or streaming data
-
Learned how to integrate unstructured data into analytics pipelines to augment decision-making
-
Applied processing techniques to clean, transform, and analyse unstructured formats in a cloud environment
-
Demonstrated how unstructured data improves model richness, enhances feature sets, and supports richer reporting
​​
Tools & Techniques:
-
Apache Spark / Dataproc for distributed processing of unstructured data
-
Text processing & NLP techniques (e.g. tokenization, TF-IDF, embeddings)
-
Log parsing, pattern matching, and regular expressions
-
Data pipelines combining structured + unstructured sources
-
Cloud tools on GCP to orchestrate storage, processing, and integration
