The Celigo iPaaS is a powerful cloud-based platform that supports designing, debugging, running and monitoring complex automations and data flows, and then also provides security, scalability, reliability, and connectivity natively out of the box. By leveraging a platform, companies can run faster, leaner operations by not needing to develop custom code for key automation projects across applications.

Celigo uses our own platform to make product development more efficient by automating massive data loads for AI/ML training, syncing usage data across all teams, automating product activation and licensing, and so much more.

In this on-demand webinar, Scott Henderson, Chief Technology Officer at Celigo, shares how Celigo engineers use to efficiently solve important problems.

Topics discussed:

  • Syncing data between different applications and technologies
  • Processing data tasks at scale
  • Automating workflows spanning different applications

Watch Now!

Full Webinar Transcript
Hi, everybody. Thank you all for coming today. My name is Kathleen Velasquez, and I’m the products marketing manager for services here at Celigo. I’d like to introduce our CTO, Scott Henderson, who will be giving a presentation today about how our engineering team leverages our iPaaS to accelerate development. Before we get started on the presentation, I just wanted to get one housekeeping item out of the way. So if you have any questions, we’ve set aside some time at the end for Q&A. So feel free to send your questions through the chatbox and we’ll answer them during the Q&A section at the end. Without any further ado, take it away, Scott. Thank you so much, Kathleen. Okay, hey, everyone. So the title of this is Leveraging Integration Platforms to Accelerate Engineering. And really, if I were to say this differently, we are just like many other software or technology companies out there. And this webinar is all about how we use our own product,, to solve important problems that ultimately make us a better, more efficient department, but also a company. And then ultimately what we’re trying to do is build the best possible product. Okay, so real quick, just Celigo. So Celigo is the name of our company and is the name of the product that we’re building. And I’m not going to go too in-depth on this slide other than to say that every department at Celigo is using to solve different types of problems. They all have their own setups. We’re all in the same account. In addition to that, there’s also this concept of pre-built integrations that you can download from our marketplace. So an example of that is the sales and marketing team. They don’t want to go build all of this custom stuff. They’ll install those apps for like NetSuite and Salesforce to get up and running much faster. So use cases for them. And then in engineering, we’re doing much lower-level designs and using the product more like a tool. Go to the next slide. So a quick overview of our team. We’re very distributed. We’ve been this way since the very beginning. The engineering department is located in two places: San Mateo, the Bay Area, and then also Hyderabad, India. And largely speaking, the majority of people are in Hyderabad. The tech stack that we use in engineering– so this is all the different apps and technologies. Then there’s lots of them. In general, we try to use the latest, greatest stuff. We really, genuinely love technology here and very much our own product is one of these bubbles. And that’s what we’re going to talk about today. How we use our own product as a part of our tech stack. Okay, so I’m about to jump into the product and here’s kind of an overview of how I’m going to do this. So the first thing I want to answer is what do we like about As engineers, we have lots of choices. There are lots of other apps that solve a similar problem, there are lots of open source technologies and frameworks that we can deploy, and then last but not least, we can always build stuff and engineers love to build things. So why do we like Secondly, I’m going to just give an overview of how our account is organized. I mentioned that the whole company is using the same account and we’ll just give you – I don’t know – a view of how that’s done and how we all share the same account. And then last, we’ll get into what problems are we solving. And really, there’s three key problems that allows us to solve for ourselves and then also for the benefit of everyone at the company. First one is enabling access to product usage data. The second is we use integrator to feed algorithms with product usage data, the stuff that’s happening in the product. And then the last one is we want to leverage NetSuite, which is our financial system, to control all the licensing and provisioning of all the users and accounts that are out there. Okay, so without further ado, I shall jump over to the products. And the first thing again that I’m going to cover is what are we like about in terms of an engineering department? And so, first and foremost, love that it’s a web app. It’s just something that we all sign into from our browser. We don’t have to launch into– we don’t have to go SSH into servers or do any type of techy thing. It’s just a very simple browser application. We love that all the roles and permissions are baked in. I can add a new user and add them to a very specific place in the product and security is super important to us. So having just that real simple web interface to add users, let them get working, and then control what they can do is a big deal. We like having a dashboard, a UI-based dashboard. We love email alerts because we’re running a lot of really important stuff and when stuff fails, here’s a preview of what we get. We get emails that say, “Hey, an error happened or an error was resolved by someone else.” Those little things just make our lives a lot easier when maintaining these things at scale. We love that has all the compliance that we need. Again, security is important. We need to talk, HIPPA, GDPR, Privacy Shield, and that just comes with it. Saying that’s kind of all another way we love because there’s nothing for us to deploy, host, operate, scale, secure. We just simply build the flows that we want, click run, and then they run on an automated basis. Kind of the second dimension is that we like that all the connectors that we needed are built-in. So if I want to connect to something, there’s 200 plus applications in here, lots of different technologies. We’ll see Mongo DB, Snowflake, and then lots of specific apps like NetSuite, Salesforce, but then also just universal connectors to HGDP or APIs, FTP, and so on. So we definitely don’t want to go build that. It’s really nice having that connectivity all there ready to use. And then outside of connectivity, there is a ton of really good tooling. So let me just load up one of these flows just to give a quick example. Sorry, wrong one. So this is what we call flow builder. So I want to design workflows and data automations. This is a really great design tool that lets me build the different bubbles that do different things and I can preview things and route the data through all these different steps, seeing what it looks like before and after. I can run these things in test mode. I can make copies of them and then make the changes back in production and so on. So just a lot of really important tooling makes our life easier. And then the last thing I’ll say is just that the thing that I love the most, kind of the delight factor is that you can just in minutes launch a new flow, and I just had to do this last week, that synchs millions of records and you define the flow, you click run and it runs and you don’t have to think about anything other than the end result that you want. I don’t have to think about a million records versus a thousand records It just magically works, and that’s really the light factor for me. Okay. So now I’m just going to give a quick overview of how are we organized and again, we’re sharing this with everyone at Celigo. And really, kind of, the foundational elements of organizing an account are these tiles. And you can think of these tiles as workspaces. And so, if I click into one of these tiles, there’s two core elements here. One, the users. So I can add any number of users in my organization to this tile, and I can assign them one of two roles, a monitor role where they are only allowed to run the things that have been defined for them. Or they can also troubleshoot errors, and look at their dashboards, and retry things. And then the other role is manage, which means they can go in here and build things, change things. They have, kind of, full access to the workspace itself. So here I add, users, assign a role, and then the other dimension is the connection. So here’s where I can say, hey, this workspace has access to my production, influxDB, my production MongoDB, the production PostgreSQL tables that we use as our repository, the production snowflake. This allows me to link up people with specific systems and control the access. I don’t want someone that I don’t know accessing a production system in any way, shape, or form. And here’s how I, kind of, group people with apps. And let’s say I wanted to make it so you can make copies of these, put them in sandbox, and have different people work on those copies that have access to the sandbox versions of the same. That’s a pattern we also use. But I love that I can just group people with apps and control that security aspect of the larger organization. Okay, and so, again, for this webinar, we’re talking only about engineering and how we use the product. And so I’m going to focus on just these top two tiles that I’ve conveniently moved up to the top left, and then we’ll look at the problems and how we solve them. So the first problem that I brought up was, we want to enable company-wide access to product usage data. So let me say, what this really means is that the rest of the company wants access to all the data that’s sitting in our product production databases. Our MongoDB instance is what we use to store, kind of, everything you’re seeing here. And then we have an InfluxDB that stores statistics, time-based data. And of the company wants access to that. But of course, in engineering, we don’t want to give access. In fact, outside of our core dev ops team, we don’t give access to anyone to those systems. So how do we solve that? Well, we have to get the data out and put it somewhere else that everyone else can use it and in a way that doesn’t affect those systems as they’re running in production. MongoDB and Influx are not the types of databases that our downstream teams want to work with. They want to work with SQL-based stuff because that’s what they’re familiar with. That’s what the analytics apps like Domo read from. So that’s another challenge where we need to translate the data from Mongo and Influx into this more common format. And again, that’s what we’ve used Integrator NL to do, is to pull the data out of those other places and put it into a set of SQL tables. And so I’ll go into the flows in a second. But real quick, just here’s a preview of our, kind of, PostgreSQL data repository that we’ve been using for some time. And just on the left, you can see all the different tables that we have. We have a table for every kind of object that’s in the product, like flows, connections, API tokens, etc. And now once it’s in these tables, the rest of the business can use it. And to provide just a very quick example, we just recently launched single sign-on, and so the product management team is already asking the question, “Who’s using it, and where is that information?” It’s in Mongo DB, but of course, they need to be able to see it somewhere else. So we’ve synced that to these databases, and they can run queries now to say, “Okay. Since launching, we have such and such number of users already turning on the SSO feature.” Okay. So now what do the flows look like to build this? You’ll see that, in the tile close tab, I have everything organized into these tabs. So all the PostgreSQL syncs are here, and you’ll see that there is a flow for every specific kind of collection record type that we’re trying to bring into the PostgreSQL tables. If we look at just one of these, we can see it’s very simple. We extract the data out of Mongo DB with an export. Here, you see that kind of collection. That’s what Mongo calls it. You can add a filter. You can use a projection to, say, only include these fields or do not include these fields. We use that with connections. We don’t want to send over any type of encrypted data into the Postgres tables. And then I have it set up as Delta because I’m always feeding in the most recent info. I run this every day. I don’t want to sync the whole database every time. So it’s just getting what happened recently and keeping everything fresh. After we get the data, we send it to Postgres. Here we have a similar kind of drawer to help you do that. There’s a SQL builder. You can kind of see. This is where you type your SQL statement out. And I’m just doing an insert. If there’s a conflict on my uni constraint, then I do an update. You can see all the resources that you have available when you’re building these templates out. You could preview. So again, this all goes back to all that tooling that I talked about. I can kind of build each piece, see that I have sample data that’s part of the flow that I can use to preview what all these things look like. And then finally, you can run it. You can run it in test mode where it does just a handful of records. And then finally, you turn it on and run the mass sync. And really, that’s what all of those flows are, is the same pattern where we extract, we transform, and then we load. And that is a pattern called ETL that probably a lot of people are familiar with. And it works. It works really well. In addition to all the flows, so we have notification set up. So if any error happens on any of them, we get alerted with an email. If the same data for some reason syncs twice and it works the second time, as often happens with just syncing data all the time, it will actually auto-resolve the error for you, the original one, and tell you that the data is now good. So that’s pretty cool. You’ll get these emails. I have the one right here. It’ll just say, “Hey, we resolved this error for you.” And so I love that. Now, where I get an email, there was an error. And then a little bit later, I’ll see an email that says, “Hey, actually I resolved it.” So that’s pretty cool. There’s also dashboarding. So this is going to load the last 30 days of activity, and it takes a while because there’s a lot of stuff going on here. But basically, you can see visually what’s going on across all your flows. You could see the average processing time and where there are errors and so on, just to look at trends and spikes. This big spike right here is because of a project we were working on. And I’m going to go show that real quick in a second. But we decided to move to Snowflake instead of PostgreSQL. And we’re kind of making those moves to transition to a different data warehousing technology than just the Postgre tables. And so this was me doing of big syncs to just switch all the data over to run in both places. Okay. So back to the flows tab. And then what I want to do is just show you this stuff. So this is kind of hot off the press. But we decided to go to Snowflake. The PostgreSQL tables have been awesome, but they only can get so big. The performance is kind of suffering. There’s a lot of little drawbacks. And kind of the biggest drawback, I would say, is the fact that every time you want to add a new field or you enhance things, you have to continually update these data flows and resync all the data again. And that pattern works great. It definitely is how things– it’s worked for a long time. But Snowflake, they’re really awesome the way that they do is instead of you transforming the data and sending them kind of the finished schemas, you can send them the raw data, just the actual MongoDB, JSON data, the influx JSON data. We can send them that information raw, and then you transform on their side. They are super fast and powerful a transformation. So it kind of inverted what we needed to do in the flows. And so where you see 18, 19 flows here, doing all the postscripts tables, I was able to accomplish the same data syncs with just one flow where I use the exact same extracts from MongoDB. It’s like pointing to the same bubbles in both sets of flows. But then I run them all just generically to one single uniform Snowflake insert. Using a bulk insert, I load it into a staging table concept that Snowflake has. It’s a static mapping of just three fields. And one of the fields is just the raw. Actually, maybe you can load this. One of the fields as a variance that is just serializing. We’re just serializing the raw JSON data so that now Snowflake has just this raw information. And so what’s really cool now is Snowflake always has the latest info. It always has all the data. And then when we want to iterate on the views and schemas and the ways that we interpret the stuff, it’s no longer something that we have to do from the pipelines that we did in Postgres where we had to go back and update these flows to change that. We can do it directly in Snowflake itself. And then when I was first experimenting with Snowflake, I was wondering how fast is Snowflake. And they’re the lightning-fast. The big delight factor for me with Snowflake was that I could run operations on millions and millions of records, and it would just be like seconds of second time to do that. And so just real quick while we’re looking at Snowflake, the second flow, just to see what this is, basically, I mentioned that you load everything into a staging table with Snowflake, and then you run what’s called a merge command where you merge two tables. And it’s based on the ID column of the data. And this, again, I was really delighted by. I was merging millions and millions of rows from one table to another. And it just was like a second– took one or two seconds to run on their side. So basically, we merged the tables after we posted all the updates. And then finally, we delete that staging table because we’re done with it. And then the next day when these things are on again, it will refresh. It’ll build up the staging table again with all the changes from the day before, and then it’ll merge, and then it will delete. And then we just run this flow every single day. So basically right now we have these Postgres tables that are being updated. And then in parallel with the exact same kind of sources, we’re running a similar sync to Snowflake. And here’s my Snowflake account where I’m just searching for the same flow that I search for on the other one. Except for on the stealth side, we have the actual raw records that we store in our MongoDB database. And then, we build views from this to then make the SQL tables friendly for downstream maps like Dalmau and so on. Okay. So again, that’s a big problem. Every software company that has a SaaS app, that has a database with all the useful info about what people are doing, they need access to that data and all these downstream apps for business teams to operate on. And this is just a great way to solve that in a way where you don’t have to sink all your data. You don’t give them raw databases of technologies they’re not familiar with. But then, you do give them a copy and a format they’re used to looking at that’s safe. And then, you control the actual information that flows out. And then, yeah. And it’s just been– it’s just worked out really well here. Okay. So the next problem that we solved with is– so we have algorithms in our products that rely on the data in our products, what all the users are doing. We use that information to help power these algorithms. And before I show those things, let me just give a quick example of what I’m talking about. So I just have a little test flow. And this is just a fake flow. But it shows what I’m about to– I’ll show how this is powered. Here, we’re listening for Salesforce contacts. They have contact events in Salesforce. Whenever one is saved, it sends the contact over here. And then, we’re going to import that into NetSuite as contact records. And so now, on the mapping screen, like this is the– that part’s easy. Like, “Hey, I want to do Salesforce NetSuite. And then, I want to do contacts. And I want to keep them in sync.” Piece of cake. Now, the painful part is like, “Okay, there’s so many fields in these systems, in NetSuite and Salesforce. And this is a really overwhelming, tedious task to always go in and have to set this.” So, we built this auto-map fields button. It’s in beta right now. It just launched. And I click it. And it’s not perfect, but it gets you a lot. It makes it so it’s just much easier to get started with your mapping. It populates the set. There’s probably multiple hundred fields. And it gives you a starting set to work with. And then, you come in and prune what you want and so on. And so, how do we do this? Let’s go back now up to the flows. The way that this works is that we look at all the mappings that are in our main primary product database. And we’re always doing this in a delta way where we’re getting the latest stuff. So every day, we kind of get the latest mappings that were updated, someone who created a new mapping between two systems. Or maybe, it’s an updated one. But we’re always getting that latest info. And we’re putting that into a Postgres table. And most likely, this will switch over to Snowflake as well. And after we get the mappings, then we start to collect information about those mappings. For example, what are the apps? And here, we’re just going back into our Postgres tables to get that info, which is being synched by those other set of flows that I showed already. But we get the app information about all the bubbles that are in flows. Here’s a bubble here and a bubble here. That’s Postgres. So, they start to collect that information. After we get the apps, we then need to deal with the record types for those apps. So, this is a Salesforce contact, NetSuite contact. So we gathered that information and we augment these tables with that information. And then after we get the record types, then we go in and we get field mappings for those record types. So we’re augmenting the fields that live in the record, that live in the app. And then so we’re building up these tables that have more and more information. And then finally, we actually go and look at the usage stats from our influx instance. So well, the data that we’re synced from influx, because we want to know, like, hey, these mappings that people are using are actually working out there, they’re successful stats versus this mapping, there’s a lot of 50/50 stats or maybe this one, it’s all errors. And we start to assign these stats to the mappings themselves and then that gets augmented onto our table. And then finally we take all that information and we build a file. And then we just transfer that into a bucket, a folder, and S3. And then we have on our servers a little– we have a server layer that listens and ingests these daily files and augments the algorithms that are running with the latest information. And so we love this because it’s a total decoupling of getting the analytics, getting these daily files and then posting it somewhere that can then be ingested. We can really control how we get the mappings like paid customers only, we can look at who we want to get it from. And then again, we love that it’s all automated running daily because with AutoMapper may be terrible for a certain set of systems, but if we go in with just some of our own accounts and start to map them out and run flows and get things working, then the next day we can have results to share with the next person who clicks the Automap button. And so it’s a way to just quickly, organically grow out the abilities of that Automap button. And again, that’s a data feature. And a lot of this is still in progress. But this gives you a sense of how we’re using to do these kind of automated– data automations and then feed stuff back into our product in a really secure, decoupled manner where we have a lot of control over all the different pieces. Okay,so that was the second problem that we’ve solved or still solving. And then the last thing I wanted to show is that we leverage NetSuite. We want to leverage NetSuite for all our licensing and provisioning. So why? In engineering, we don’t want to build up a licensing page. We don’t want to have to host that internal backdoor stuff for people. It’s just a pain. And then on top of that, it’s just you need to ultimately automate all that stuff at scale. And all of our customers and their info and the subscriptions, all that stuff is in NetSuite. That’s our master source of truth for that. So really, we want it all to be controlled from that place and to have the things just keep our products database up to date. So let’s go look at how that’s built. And so we have a bunch of flows and really what I’ll focus on is just one flow initially and then I’m going to show you just a quick view in NetSuite. So here’s our NetSuite account, and this is the license record for the account that we’re looking at right now, our cooperate account. And so it’s just really standard stuff, the links to a subscription, when does it expire, how many flows, what are the entitlements, etc, etc. And then when you change stuff here, there’s a listener set up. So we’re listening for changes to this. And then when those changes come through, we route them. If it’s for North America, we route it to our North America Mongo database instance. If it’s for EU, we route it to the EU instance and then finally we give the results back into the NetSuite account so the people making changes there can see it worked or it didn’t work. And we’ve had this going for a long time. This pattern has worked out really well. Why were there all those other flows? It’s because once we started doing the licensing with NetSuite, then it was like– people in NetSuite are like, “Hey, well, I really actually want to see more than just the license record. I need to know more about this because I’m about to downgrade them or upgrade them. And I need to know just more about this account before I feel comfortable changing something.” So then they asked us for all the information about the account. So now I’m going to click into just a thing that we built, and that’s either just custom records– but you can see the account, who’s the owner, you can see all the users that they have in there. These are all the people in our own single account. All the flows that we have running the connections, the licenses. And so it’s just the team operating on it in NetSuite really wanted that visibility. And so then we just built all these flows really quickly that sync information from the Mongo database into these NetSuite custom records. You can see some of them also have extra steps. When someone new signs up to our database, we create their stuff in NetSuite, we also create a user for them in Litmos so they can access our university program. And then we normally will tell Slack because people like to watch the Slack channel of people signing up and so on. So again, I think that kind of covers this last problem or– we want to leverage NetSuite for provisioning and licensing. is a great way to decouple those systems and fully automate everything. And then, of course, once we started doing that, people wanted more. But again, allowed us to quickly build out those data flows, to populate the rest of the information that they wanted, and just been a really great solution overall. And yeah. So that covers the last kind of problem that I wanted to go through. There are other things we do on the accounts in engineering, like intercepting bounced emails. We have like listeners for our GitHub check-ins where we post them to Slack or reformat them. So a lot of fun, silly stuff we do as well. But those other things I showed were really the primary business mission-critical things we’ve solved where has just been an incredible asset to have on the tool. Tool on the shelf. Okay, so let me go back to our slides and I think we’re done with the main demo and I think we are ready for Q&A. Yeah, thank you, Scott. It’s Q&A time now. So I’ll be checking the comments for any questions. So if any of you have a question to ask, feel free to type them into the chatbox. I’ll give it a minute and wait to see if anybody has any questions. Okay. Let’s see. Okay. Well, I think it’s good to go to the next slide. So thank you, everybody, for joining us today. I hope you enjoyed the presentation and learned about how an I-PASS can benefit product engineering. So if you have any questions, feel free to contact us at any time. We should have our contact information here on the slides. So yeah, feel free to reach out. Thanks again so much for attending this webinar. We really appreciate everybody coming out here today. Thank you. Bye.

About the speakers

Kathleen Velasquez

Scott Henderson

Scott Henderson has worked in the enterprise software space since 2003, focusing largely on emerging technologies and applying them to business software integration problems.

Scott was Celigo’s first engineer and has since led the company through multiple technology shifts. He is always reading and toying with new ideas and loves working with talented developers that have a knack for both engineering and design. Scott currently oversees Celigo’s product engineering operations along with Celigo’s product technology strategy.

Scott holds a B.S. in Electrical Engineering and Computer Science from the University of California, Berkeley.

Meet Celigo

Celigo automates your quote-to-cash process with an easy & reusable integration platform-as-a-service (iPaaS), trusted by thousands of eCommerce and SaaS companies worldwide.

Use it now and later to expedite integration work without adding more data silos, specialized technical skillsets or one-off projects.