Streamlit company logo

The profile is currenly unclaimed by the seller. All information is provided by CB Insights.

streamlit.io

Founded Year

2018

Stage

Acquired | Acquired

Total Raised

$62M

Valuation

$0000 

About Streamlit

Streamlit is an app framework specifically for machine learning and data science teams. Streamlit aims to turn data scripts into shareable web apps in pure Python.On March 2nd, 2022, Streamlit was acquired by Snowflake.

Streamlit Headquarter Location

777 Oak St

Millbrae, California, 94030,

United States

Predict your next investment

The CB Insights tech market intelligence platform analyzes millions of data points on venture capital, startups, patents , partnerships and news mentions to help you see tomorrow's opportunities, today.

Research containing Streamlit

Get data-driven expert analysis from the CB Insights Intelligence Unit.

CB Insights Intelligence Analysts have mentioned Streamlit in 3 CB Insights research briefs, most recently on Mar 8, 2022.

Expert Collections containing Streamlit

Expert Collections are analyst-curated lists that highlight the companies you need to know in the most important technology spaces.

Streamlit is included in 1 Expert Collection, including Artificial Intelligence.

A

Artificial Intelligence

9,093 items

This collection includes startups selling AI SaaS, using AI algorithms to develop their core products, and those developing hardware to support AI workloads.

Latest Streamlit News

Searching For Semantic Similarity!

Aug 5, 2022

Originally published on Towards AI the World’s Leading AI and Technology News and Media Company. If you are building an AI -related product or service, we invite you to consider becoming an AI sponsor . At Towards AI, we help scale AI and technology startups. Let us help you unleash your technology to the masses. This blog post is all about what we look for while making friends! Jokes apart, in this project, we will learn how to compute the similarity between a search query and a database of texts. We will also rank all the data according to their similarity score and retrieve the most similar text with its index. Background Before we jump into how the project works, it is of utmost importance to understand its applications and uses of the same. Practically, all day we run into similarity algorithms , some advanced and some very basic and useful, such as the cosine similarity. Techno-legal domain data, such as patents, work a lot with this task statement to conduct searches for overlapping technologies or inventions. While search methods can either be corpus-based, the ones that rely on the larger corpora of information to draw semantics between the concepts, or, deep learning based, where neural networks are used to compute the embeddings as well as the distance. [1] In this blog, we will discuss one example from both these methods as we understand how cosine similarity helps us in leveraging the distance information between these two different embedding approaches. Dataset & Preprocessing In our experiments, we look into the Stack Overflow Questions Dataset where we attempt to find similar questions to our query. The dataset contains 60,000 Stack Overflow questions from 2016–2020. We use the function read_sc to read our search criteria which can be either entered as a string or as a path to the text file containing it, and the function is clean to load the input file into a dataframe . Preprocessing the text plays a key role for corpus-based embeddings such as Word2Vec and fastText. While domain-specific processing techniques such as introducing contraction of text or expansion of abbreviations improve the algorithms , our data is comparatively clean, and thus we opt for the standard NLP preprocessing techniques. The function below takes a string as input, removes punctuations and stopwords, and returns a list of tokens. Exploring The Embeddings Sentence-BERT The algorithm uses the paraphrase-MiniLM-L6-v2 by HuggingFace for generating the embeddings, which is used for tasks like clustering or semantic search since it maps phrases and paragraphs to a 384-dimensional dense vector space. Instead of using sentence transformers which add to the slug size while deploying the model, we pass our input to the transformer model and then apply the right pooling operation on top of the contextualized word embeddings. This transformer model was also versioned with DVC. FastText But why use fastText when we already have all superior BERT-based embeddings in place, you ask? When it comes to long-form documents, nothing beats the good old fastText! The sentence-Bert embeddings have a common limitation value of 512-word pieces, and this length can not be increased. But when we use corpus-based embeddings such as Word2Vec or FastText, we do not have any such limitation. FastText operates at a finer level using character n-grams, where words are represented by the sum of the character n-gram vectors, in contrast to Word2Vec, which internally utilizes words to predict words. And thus, we never run into the classic ‘out of vocabulary’ error with FastText, and it works well for several languages for which the words are not present in its vocabulary. Calculating The Similarity With Cosine Similarity Understanding Cosine Similarity ( Source ) As depicted in the illustration above, in the case of cosine similarity, we measure the angle between two vectors (or embeddings). There are numerous other similarity measures, such as the Euclidean distance and Jaccard similarity. However, cosine distance performs better than these as it is not just a measure of common words or magnitude of overlap of the concepts but considers the orientation of vectors in the embedding space and thus even if the documents are of incomparable sizes, it can accurately calculate their similarity. Above is the basic function to compute the cosine similarity between two vectors a and b for your reference. We explore Pytorch and Scipy-based cosine similarity functions in our code. We compute the embeddings of the search text as well as the sentences in our dataset , and for sentence-bert, we utilize the util.pytorch_cos_sim function and for fastText, we use scipy.spatial.distance.cosine to calculate similarity between them. Structuring the Code We have a main.py file that takes the arguments such as model name, search criteria, a file containing all the data, name of the column from which you want to search the similar text. The main function calls one of the two classes defined for the BERT-based embedding and fastText embedding according to the argument you choose. Analyzing The Algorithms As we are not dealing with the traditional supervised learning problem where we train our model to log the accuracy and hyperparameters, a challenge was to decide how to log and evaluate the performance. While the models return a ranked output file in the order of their similarity to the search text, I also decided to log the most similar sentence scored by both models individually and its index to understand if the models capture the semantics in a similar fashion. An interesting comparison parameter could be computation time as well. Let’s take a phrase and check the results! Phrase: Why is Java Optionals immutable? Both the algorithms return the same value where the selected match has nearly the same scores. Logs for Sent-BERT and FastText Now the Streamlit App! pip install -r requirements.txt Streamlit App for Similarity Calculation Simply enter the string of text which you want to look for and the ‘.csv’ file in which you want to look into. You will need to enter the name of the column and make a choice between the two models! Here’s how the output looks : Similarity results : Streamlit WebApp Streamlit is easy to use, for example, to input the data and run the calculation function, we use : Deploying The WebApp On AWS EC2 Instance We decide to deploy our app in an AWS EC2 instance, and this blog is the holy grail for it. There are 5 steps to launch your EC2 instance : Select an AMI (Amazon Machine Image), we go for the free tier eligible one! 2. Next, you need to choose an instance type. 3. Create and download an RSA type keypair. 4. Now, we need to modify Network Settings and create a new security group. You need to name your security group and then add two new rules with the ‘Custom TCP’ type and set the port range as 8501 for one rule and 8502 for another. 5. Now select the storage you require and launch the instance! Now your instance is running! I follow the easiest way to connect to this instance, and you can follow along. ???? Simply select the instance above and click to connect, and there you have the terminal right in front of you! If you have followed so far, you already have your model and data pushed to DVC (and the dagshub repository), and all we need to do now is to pull the model using a simple script that prepares the EC2 instance and launches our app. The script I followed is the same as the blog referred to above :

Streamlit Web Traffic

Rank
Page Views per User (PVPU)
Page Views per Million (PVPM)
Reach per Million (RPM)
CBI Logo

Streamlit Rank

  • When was Streamlit founded?

    Streamlit was founded in 2018.

  • Where is Streamlit's headquarters?

    Streamlit's headquarters is located at 777 Oak St, Millbrae.

  • What is Streamlit's latest funding round?

    Streamlit's latest funding round is Acquired.

  • How much did Streamlit raise?

    Streamlit raised a total of $62M.

  • Who are the investors of Streamlit?

    Investors of Streamlit include Snowflake Computing, Gradient Ventures, GGV Capital, Sequoia Capital, Daniel Gross and 7 more.

  • Who are Streamlit's competitors?

    Competitors of Streamlit include Plotly.

You May Also Like

Plotly Logo
Plotly

Plotly is a collaborative platform for analyzing, graphing, and sharing data. Its like GitHub, for data and graphs. Users get powerful analytical tools to make sense of data with beautiful graphs. The product is online, social, and collaborative, meaning you don't have to work alone, or download tools. You can do all your coding, analytics, graphing, and collaboration inside Plotly.

GitHub Logo
GitHub

GitHub is a social network to share code with friends, co-workers, classmates, and strangers. Users have access to free public repositories, collaborator management, issue tracking, wikis, downloads, code review, graphs and more. The hosted service GitHub.com is free for open source projects and according to much of the developer community, it has helped to improve open source collaboration.On October 26, 2018, GitHub was acquired by Microsoft at a valuation of $7.5B.

Discover the right solution for your team

The CB Insights tech market intelligence platform analyzes millions of data points on vendors, products, partnerships, and patents to help your team find their next technology solution.

Request a demo

CBI websites generally use certain cookies to enable better interactions with our sites and services. Use of these cookies, which may be stored on your device, permits us to improve and customize your experience. You can read more about your cookie choices at our privacy policy here. By continuing to use this site you are consenting to these choices.