Labs & musings
Syntio Janitor Syntio Janitor
Product / 10.12.2020
If you are managing an elaborate data platform, it is crucial to get the right data in the right place at the right time. But you will also know that this can be somewhat of a challenge. Especially if your data producers frequently come up with new attributes.
Introducing Janitor by Syntio Data Engineering.
Having the right schema validation service is the most crucial part of getting the data processing in the right way. That process makes sure that every little chunk of information that comes from your app, website, machinery etc. will be put in the right box and then is send to the right processing step. This schema always needs to be up to date, so you won’t be missing anything.
New Attributes as Bottleneck
But new attributes are generated by your data producers on a regular basis. Think of how regular the apps on your phone need an update. All these updates mean that new data attributes become available to the data platforms behind those apps. This is a good thing, as more data means more insights and more ways to make better business decisions. Getting those new attributes ready for processing on your platform is crucial, cause by ignoring them you could lose valuable time and in a worst-case scenario even a lot of money.
And that is where things get a bit tricky sometimes. Cause notifications can be sent late or not at all or the purpose of certain attributes can be sketchy. Let’s say you are working with a lot of machine generated data and these machines just got a big firmware upgrade. The result of this upgrade is that these machines are sending a lot of new attributes to your data platform for which the use is completely unclear. By not updating your schema until you have a complete mapping of the data, you run the risk that vital information on the state of these machines is not available and maintenance will be too late, potentially causing huge problems.
Cleaning up Schemas
Janitor is a schema registry component that will make sure that your data schema is always up to date. This way the data consumers on your platform never miss out on any new attributes, ever again. As soon as one of your data producers will send through new attributes or a new order of them, Janitor will send a message to either this data producer or your data engineers to notify them of the changes. This way the producer can check if what they are sending through is right or the engineer can act on the change in attributes almost immediately.
At the same time Janitor can interpret the data flow coming in and adjust your data schema automatically, so all attributes become directly available for the data consumers on your platform. It is even possible for Janitor to help you categorize the previously accumulated data in the new schema. And all previous schema versions will be stored on the component in an easily navigable way so getting back to a previous set up should never be a problem.
Janitor is an open source component and free to download at Github. It works with virtually every data format so setting it up won’t be much of a problem. For more information, remarks or suggestions contact us at Janitor@syntio.net.