Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Not getting 10,000 unique package names #7

Open
nazanaza2970 opened this issue Feb 20, 2024 · 4 comments
Open

Not getting 10,000 unique package names #7

nazanaza2970 opened this issue Feb 20, 2024 · 4 comments

Comments

@nazanaza2970
Copy link

I am using the raw.json file to get the list of package names. Here is the code:

with open("jplabfiles/raw.json", encoding='utf-8') as f:
    data = json.load(f)
package_names = []
for item in data:
    package_names.append(item['name'])
print(len(set(package_names)))

I am getting only 5247 packages. Please let me know if I am doing something wrong.

@LeoDog896
Copy link
Owner

Interesting.. that script is right; it may be an NPM API problem; I'll see what I can do.

@Zhengqbbb
Copy link

Zhengqbbb commented May 13, 2024

CleanShot 2024-05-13 at 16 55 37@2x

  1. Looks like the pkg list have many repetition.
  2. And it seems that I increase the number of requests, the number of pkg is always 5247.

🫠🫠🫠

CleanShot 2024-05-13 at 17 08 15@2x

@LeoDog896
Copy link
Owner

Since this issue got bumped again, I did a more formal investigation into the NPM registry, and it seems as if the registry itself is reduplicating package names after a certain point. I've sent them out an email, and will report back here when they reply.

@LeoDog896
Copy link
Owner

LeoDog896 commented May 30, 2024

They have not responded via email - I've filed a PR/issue? on the registry repository, but I'm afraid this will probably not bring attention. If there's a way to get the entire NPM info dump, that would be great.

I'll try using the Replicate API later (maybe I can also get a bigger list of packages!)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants