The right technology choices can accelerate success for a cloud born business. This is true for the fintech start-up clearTREND Research. Their solution architecture team knew one of the most important decisions would be the database decision between SQL or NoSQL. After research, experimentation, and many design iterations the team was thrilled with their decision to deploy on Microsoft Azure Cosmos DB. This blog is about how their decision was made.
Data and AI are driving a surge of cloud business opportunities, and one technology decision that deserves careful evaluation is the choice of a cloud database. Relational databases continue to be popular and drive a significant demand with cloud-based solutions, but NoSQL databases are well suited for distributed global scale solutions.
For our partner clearTREND, the plan was to commercialize a financial trend engine and provide a subscription investment service to individuals and professionals. The team responsible for clearTREND’s SaaS solution are a veteran team of software developers and architects who have been implementing cloud-based solutions for years. They understood the business opportunity and wanted to better understand the database technology options. Through their due diligence, the architecture morphed as business priorities and data sets were refined. After a lot of research and hands-on experimentation, the architectural team decided on Azure Cosmos DB as the best fit for the solution.
Business models are under attack, especially in the financial industry. Cosmos DB is a technology that can adapt, evolve, and allow a business to innovate faster in order to turn opportunities into strategic advantages.
Six reasons to choose Cosmos DB
Below are reasons the team at clearTREND selected Cosmos DB:
- Schema design is much easier and flexible. With an agile development methodology, schemas change frequently and the ability to quickly and safely implement changes is a big advantage. Cosmos DB is schema-agnostic so there is massive flexibility around how the data can be consumed.
- Database reads and writes are really fast. Cosmos DB can provide less than 10 millisecond reads and writes, backed with a service level agreement (SLA).
- Queries run lightning fast and autoindexing is a game-changer. Reads and writes based on a primary or partition key are fast, but for many NoSQL implementations, queries executed against non-keyed document attributes may perform poorly. Secondary indexing can be a management and maintenance burden. By default, Cosmos DB automatically indexes all the attributes in a document so query performance is optimized as soon as data is loaded. Another benefit of auto-indexing is that the schema and indexes are fully synchronized so schema changes can be implemented quickly without downtime or management needed for secondary indexes.
- With thoughtful design Cosmos DB can be very cost-effective. The Cosmos DB cost model depends on how the database is designed via number of collections, partitioning key, index strategy, document size, and number of documents. Pricing for Cosmos DB is based on resources that have been reserved, these resources are called request units or RUs and are described in the “Request Units in Azure Cosmos DB” documentation. The clearTREND schema design is implemented as a single document collection and the entire cost of the solution on Azure, including Cosmos DB is at an affordable monthly price. Keep in mind this is a managed database service so monthly cost includes support, 99.999 percent high-availability, an SLA for read and write performance, automatic partitioning, data encrypted by default, and automatic backups.
- Programmatically re-size capacity for workload bursts. The clearTREND workload has a predictable daily burst pattern and RUs can be programmatically adjusted. When additional compute resources are needed for complex processing or to meet higher throughput requirements, RUs can be increased. Once the processing completes, RUs are adjusted back down. This elasticity means Cosmos DB can be re-sized in order to cost-effectively adapt to workload demands.
- Push-button globally distributed data. Designing for future scalability of a solution can be tricky, technology and design choices can become inefficient as a solution grows beyond the initial vision. The advantage with Cosmos DB is that it can become a globally configured, massively scaled out solution with just a few clicks. There are none of the operational complications of setting up and managing a cloud-scale, NoSQL distributed database.
Design and implementation tips for Cosmos DB
If you are new to Cosmos DB, here are some tips from the clearTREND team to consider when designing and implementing a solution:
- Design the schema around query and API optimization. Schema design for a NoSQL database is just as important as it is for a relational database management system (RDBMS) database, but it’s different. While a NoSQL database doesn’t require pre-defined table structures, you do have to be intentional about organizing and defining the document schema while also being aware of where and how relationships will be represented and embedded. To guide the schema design, the clearTREND team tends to group data based on the data elements that are written and retrieved by the solution’s APIs.
- Design a flexible partition key. Cosmos DB requires a partition key to be specified when creating a document collection over 10GB. Deciding on a partition key can be tricky because initially it may not be clear what the optimal choice is for a partition key. Should it be a data category, geographical region, ID field, or a time scale like day, week, or month? A poorly designed partition key can create a performance bottleneck called a hot spot which concentrates read and write activity on a single partition rather than distributing activity evenly across partitions. If a partition key has to be changed, it can impact application availability as the underlying data is copied to the new collection and re-indexed. The clearTREND team uses an approach that affords flexibility in setting a partition key. The partition key is a string called PartitionID and initially it was set to be a value that represents a geography. Later when it was realized a more efficient key would be a calculated field, they programmatically replaced the geography values with the calculated values, avoiding a data copy and re-indexing operation.
- Consider a schema design based on a single collection. A common design strategy is to use one document type per collection, but there are benefits to storing multiple document types in a single collection. Collections are the basis for partitioning and indexing so it may not seem intuitive to store multiple document types in a single collection. But it can maximize functionality with no cross-collection operations needed and minimize overall cost, this is because a single collection is less expensive than multiple collections. The clearTREND solution has seven different document types, all stored in a single collection. The approach is implemented with an enumerated field called doc type from which all documents are derived. Every document has a doc type property to correspond to one of the seven document types.
- Tune schema design by understanding the RU costs of complex queries and stored procedure operations. It can be difficult to anticipate the costs for complex queries and stored procedures, especially if you don’t know in advance how many reads or writes Cosmos DB will need to execute the operation. Capture the metrics and costs (RUs) for complex operations and use the information to streamline schema design. One way to capture these metrics is to execute the query or stored procedure from the Cosmos DB dashboard on the Azure portal.
-
Consider embedding a simple or calculated expression as a document property. If there are requirements to calculate a simple aggregation like a count, sum, minimum, and maximum, or there is a need to evaluate a simple Boolean logic expression, it may make sense to define the expression as a property of the base document class. For instance, in a logging application there is likely logic to evaluating conditions and determine if an operation has been successful or not. If the logic is a simple Boolean expression like the one below, consider including it in the class definition:
public class LogStatus { // C# example of a Boolean expression embedded in a class definition public bool Failed => !((WasReadSuccessful && WasOptimizationSuccssful && StatusMsg == “Success”) || (WasReadSuccessful && !IsDataCurrent)); public string StatusMsg {get; set;} public bool WasReadSuccessful {get; set;} public bool WasOptimizationSuccessful {get;set} public bool IsDataCurrent {get;set} }
The command field showing Failed is defined as a read-only calculated property. If database usage is primarily read intensive, then this approach has the potential to reduce overall RU cost as the expression is evaluated and stored or when the document is written. This is an alternative to reducing cost each time the document is queried.
- Remember, referential integrity is implemented in the application layer. Referential integrity ensures that relationships between data elements are preserved, and with an RDBMS referential integrity is enforced through keys. For example, an RDBMS uses primary and foreign keys to ensure a product exists before an order for it can be created. If referential integrity is a requirement and data dependencies need to be monitored and enforced, it needs to be done at the application layer. Be rigorous about testing for referential and data integrity.
- Use Application Insights to monitor Cosmos DB activity. Application Insights is a telemetry service and for this solution was used to collect and report detailed performance, availability, and usage information about Cosmos DB activities. Azure Functions provided the integration between Cosmos DB and Application Insights through the use of Metrics Explorer and the capability to capture custom events using TelemetryClient.GetMetric() .
Recommended next steps
NoSQL is a paradigm rapidly shifting the way database solutions are implemented in the cloud. Whether you are a developer or database professional, Cosmos DB is an increasingly important player in the cloud database landscape and can be a game changer for your solution. If you haven’t already, get introduced to the advantages and capabilities of Cosmos DB. Take a look at the documentation, dissect the sample GitHub application, and learn more about design patterns:
- Fintech Startup Commercializes Internal tool as a SaaS Product.
- Discover clearTREND, the world’s first cloud-based financial trend engine.
- Try Cosmos DB for free. You get a limited time, full service experience. Try it out, run through a tutorial or demo, and step through a quick start without a required Azure account or credit card.
- If you are a developer, try out the Cosmos DB emulator. Develop and test an application locally without creating an Azure subscription or incurring costs. Once the application works, switch to using Azure Cosmos DB.
Thank you to our partners clearTREND and Skyline Technologies!
One of the great things about working for Microsoft are the opportunities to work with customers and partners, and to learn through them about their creative approaches for implementing technology. The team that designed and implemented the clearTREND solution are architects and developers with Skyline Technologies. Passionate about their business clients and solving complex technical challenges, they were very early cloud adopters. We especially appreciate the team members who gave their time to this effort including Tim Miller, Greg Levenhagen, and Michael Lauer. It’s been a pleasure working with you.