Hi Alvin users and fans,
We hope you had a fantastic April! Now that the dust has settled, we wanted to take a moment to reflect on the exciting updates and enhancements we brought to Alvin during April.
New Features & Improvements
SQL Lens
After seeing that many customers are often writing complex SQL where the visual lineage is too high-level, we decided to take a shot at visualizing the relationship between the SQL and it’s representation, or structure (AST for you nerds). Anywhere in the product that we are showing a SQL statement, like in Workloads or any of the lineage views, you can hit “view fullscreen” and you will get a visual, bi-directional navigator view for SQL. I has features like jumping back to definition for a CTE and more. We are already planning some automated insights on top of this work, namely “SQL insights”, which will find problematic expressions or code quality issues like statements with unused CTEs, columns or the infamous “SELECT *” from a table.
Anomaly Detection and Monitoring for Workloads
We know the feeling - there are too many things going on. But no worries: with our new monitoring and alerting feature for workloads you can just enable it for one or more workload types (a dashboard and dbt model for instance). The monitoring itself is based on a robust ARIMA algorithm that can take into account seasonality and special holidays per customer of Alvin. Just reach out if you have any specific questions!
Docs can be found here: https://docs.alvin.ai/feature-guide/cost-anomaly-detection/monitors-and-notifications
Lineage Grouping
From talking to customers as well as building a data product ourselves, we have found that a common pattern is to use scratch tables with md5 hash suffixes, or exporting daily logging tables to YYYYMMDD tables (this is very common in GCP / BigQuery). Also, some companies that have been using BigQery for a long time, before partitioning was available, the common pattern is/was to use sharding by the naming convention YYYYMMDD.
The problem with this is that it makes confuses “logical lineage” with the “techincal lineage”. We’re happy to say that we have now solved this problem for all customers!
We’ve also added automatic detection of PR’s that are issues with datafold (the pattern here is that each PR is created in a schema named after the PR number).
Auto-detection of workload types
With a lot of different systems and connected to the data warehouse it can be hard to understand and attribute cost without a lot of manual effort and investigation. We’ve done the hard work for you, by utilizing a combination of usernames, labels, tags, comments and well-known query structures. The first iteration includes auto-discovery of jobs from
- Looker
- Looker Studio
- Dataform
- Dbt
- Google sheets
- Hightouch
In making this work, we have also added some exciting capabilities for Alvin customers to override and add their own “special” tags or query comments that would allow jobs to be associated with an app, team, domain or owner. This will be added to the documentation at a later stage.
Behind the Scenes: Tackling Bugs & Refining the Experience
We've also dedicated time to squashing pesky bugs and fine-tuning Alvin to ensure a smoother, more enjoyable experience for everyone. Some key accomplishments include:
- Parser improvements: We have worked quite a bit under the hood to make the faster more correct, robust and supportive of new bleeding edge SQL features such as “Group by ALL” etc.
- Display improvements for entity and timestamp formattings: After using the product a lot ourselves as well as talking to customers we could not come up with the “one” true way of displaying entity names and timestamps. We could have spent more time, created focus groups and iterated to the ultimate solution but at the end of the day people are different. So we decided to give you the option to pick you poison! All saves on local storage for your convenience!
On the Horizon: What's Next for Alvin
We're always looking forward, and here's a sneak peek at what we're working on for the coming months:
- Auto-assigment of workloads: Based on the work done to detect workloads, the entire product section will be refactored to assign workloads based on the most contextual identification. I.e a query will always be associated with it’s dbt model or Looker dashboard and only in the cases where it cannot be assigned to any actual system it will be labeled as a query fingerprint. This will make alerting/monitoring much smoother and avoid duplicate/overlapping alerts.
- Lineage superpowers: The lineage explorer will get a huge facelift with capabilities i.e to get all sources and destinations for a given table/dashboard, in addition to numerous UX and UI improvements to make navigation easier.
Your Feedback Matters to Us!
We're committed to continuously improving Alvin, and your feedback is essential to our progress. We'd love to hear your thoughts and suggestions on the recent updates, or any other ideas you have. Just let us know by replying to this email or reach out on LinkedIn or Slack.
Thank you for being a part of our journey!
The Alvin Team